rocksdb/db/db_impl
Levi Tamasi 3f7e929865 Fix a race in ColumnFamilyData::UnrefAndTryDelete (#8605)
Summary:
The `ColumnFamilyData::UnrefAndTryDelete` code currently on the trunk
unlocks the DB mutex before destroying the `ThreadLocalPtr` holding
the per-thread `SuperVersion` pointers when the only remaining reference
is the back reference from `super_version_`. The idea behind this was to
break the circular dependency between `ColumnFamilyData` and `SuperVersion`:
when the penultimate reference goes away, `ColumnFamilyData` can clean up
the `SuperVersion`, which can in turn clean up `ColumnFamilyData`. (Assuming there
is a `SuperVersion` and it is not referenced by anything else.) However,
unlocking the mutex throws a wrench in this plan by making it possible for another thread
to jump in and take another reference to the `ColumnFamilyData`, keeping the
object alive in a zombie `ThreadLocalPtr`-less state. This can cause issues like
https://github.com/facebook/rocksdb/issues/8440 ,
https://github.com/facebook/rocksdb/issues/8382 ,
and might also explain the `was_last_ref` assertion failures from the `ColumnFamilySet`
destructor we sometimes observe during close in our stress tests.

Digging through the archives, this unlocking goes way back to 2014 (or earlier). The original
rationale was that `SuperVersionUnrefHandle` used to lock the mutex so it can call
`SuperVersion::Cleanup`; however, this logic turned out to be deadlock-prone.
https://github.com/facebook/rocksdb/pull/3510 fixed the deadlock but left the
unlocking in place. https://github.com/facebook/rocksdb/pull/6147 then introduced
the circular dependency and associated cleanup logic described above (in order
to enable iterators to keep the `ColumnFamilyData` for dropped column families alive),
and moved the unlocking-relocking snippet to its present location in `UnrefAndTryDelete`.
Finally, https://github.com/facebook/rocksdb/pull/7749 fixed a memory leak but
apparently exacerbated the race by (otherwise correctly) switching to `UnrefAndTryDelete`
in `SuperVersion::Cleanup`.

The patch simply eliminates the unlocking and relocking, which has been unnecessary
ever since https://github.com/facebook/rocksdb/issues/3510 made `SuperVersionUnrefHandle` lock-free.
This closes the window during which another thread could increase the reference count,
and hopefully fixes the issues above.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8605

Test Plan: Ran `make check` and stress tests locally.

Reviewed By: pdillinger

Differential Revision: D30051035

Pulled By: ltamasi

fbshipit-source-id: 8fe559e4b4ad69fc142579f8bc393ef525918528
2021-08-02 18:12:11 -07:00
..
compacted_db_impl.cc Make backups openable as read-only DBs (#8142) 2021-04-06 14:37:53 -07:00
compacted_db_impl.h Move compacted_db_impl.[c|h] to db/db_impl (#8082) 2021-03-23 13:49:26 -07:00
db_impl_compaction_flush.cc Fix a race in ColumnFamilyData::UnrefAndTryDelete (#8605) 2021-08-02 18:12:11 -07:00
db_impl_debug.cc Eliminate compiler complaining, which the return type of the function… (#8498) 2021-07-08 10:09:05 -07:00
db_impl_experimental.cc Replace reinterpret_cast with static_cast_with_check (#7067) 2020-07-02 19:25:41 -07:00
db_impl_files.cc Allow WAL dir to change with db dir (#8582) 2021-07-30 12:16:44 -07:00
db_impl_open.cc Allow WAL dir to change with db dir (#8582) 2021-07-30 12:16:44 -07:00
db_impl_readonly.cc Make backups openable as read-only DBs (#8142) 2021-04-06 14:37:53 -07:00
db_impl_readonly.h RocksJava - Add errorIfLogFileExists parameter to RocksDB.openReadOnly (#7046) 2020-09-17 15:41:25 -07:00
db_impl_secondary.cc Allow WAL dir to change with db dir (#8582) 2021-07-30 12:16:44 -07:00
db_impl_secondary.h Implement missing Handler methods in ColumnFamilyCollector. (#8456) 2021-07-12 09:23:45 -07:00
db_impl_write.cc Several simple local code clean-ups (#8565) 2021-07-30 12:07:49 -07:00
db_impl.cc Allow WAL dir to change with db dir (#8582) 2021-07-30 12:16:44 -07:00
db_impl.h Delete legacy code not used any more. (#8508) 2021-07-14 16:04:56 -07:00