Fix race condition in SstFileManagerImpl error recovery code (#9435)
Summary: There is a race in SstFileManagerImpl between the ClearError() function and CancelErrorRecovery(). The race can cause ClearError() to deref the file system pointer after it has been freed. This is likely to occur during process shutdown, when the order of destruction of the DB/Env/FileSystem and SstFileManagerImpl is not deterministic. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9435 Test Plan: Reproduce the crash in a TSAN build by introducing sleeps in the code, and verify with the fix. Reviewed By: siying Differential Revision: D33774696 Pulled By: anand1976 fbshipit-source-id: 643d3da31b8d2ee6d9b6db5d33327e0053ce3b83
This commit is contained in:
parent
8822562d75
commit
beb86addeb
@ -31,6 +31,7 @@ Note: The next release will be major release 7.0. See https://github.com/faceboo
|
||||
* Fix a bug that FlushMemTable may return ok even flush not succeed.
|
||||
* Fixed a bug of Sync() and Fsync() not using `fcntl(F_FULLFSYNC)` on OS X and iOS.
|
||||
* Fixed a significant performance regression in version 6.26 when a prefix extractor is used on the read path (Seek, Get, MultiGet). (Excessive time was spent in SliceTransform::AsString().)
|
||||
* Fixed a race condition in SstFileManagerImpl error recovery code that can cause a crash during process shutdown.
|
||||
|
||||
### New Features
|
||||
* Added RocksJava support for MacOS universal binary (ARM+x86)
|
||||
|
@ -256,7 +256,7 @@ void SstFileManagerImpl::ClearError() {
|
||||
while (true) {
|
||||
MutexLock l(&mu_);
|
||||
|
||||
if (closing_) {
|
||||
if (error_handler_list_.empty() || closing_) {
|
||||
return;
|
||||
}
|
||||
|
||||
@ -297,7 +297,8 @@ void SstFileManagerImpl::ClearError() {
|
||||
|
||||
// Someone could have called CancelErrorRecovery() and the list could have
|
||||
// become empty, so check again here
|
||||
if (s.ok() && !error_handler_list_.empty()) {
|
||||
if (s.ok()) {
|
||||
assert(!error_handler_list_.empty());
|
||||
auto error_handler = error_handler_list_.front();
|
||||
// Since we will release the mutex, set cur_instance_ to signal to the
|
||||
// shutdown thread, if it calls // CancelErrorRecovery() the meantime,
|
||||
|
Loading…
Reference in New Issue
Block a user