rocksdb/db/db_impl
Akanksha Mahajan 596e9008e4 Stall writes in WriteBufferManager when memory_usage exceeds buffer_size (#7898)
Summary:
When WriteBufferManager is shared across DBs and column families
to maintain memory usage under a limit, OOMs have been observed when flush cannot
finish but writes continuously insert to memtables.
In order to avoid OOMs, when memory usage goes beyond buffer_limit_ and DBs tries to write,
this change will stall incoming writers until flush is completed and memory_usage
drops.

Design: Stall condition: When total memory usage exceeds WriteBufferManager::buffer_size_
(memory_usage() >= buffer_size_) WriterBufferManager::ShouldStall() returns true.

DBImpl first block incoming/future writers by calling write_thread_.BeginWriteStall()
(which adds dummy stall object to the writer's queue).
Then DB is blocked on a state State::Blocked (current write doesn't go
through). WBStallInterface object maintained by every DB instance is added to the queue of
WriteBufferManager.

If multiple DBs tries to write during this stall, they will also be
blocked when check WriteBufferManager::ShouldStall() returns true.

End Stall condition: When flush is finished and memory usage goes down, stall will end only if memory
waiting to be flushed is less than buffer_size/2. This lower limit will give time for flush
to complete and avoid continous stalling if memory usage remains close to buffer_size.

WriterBufferManager::EndWriteStall() is called,
which removes all instances from its queue and signal them to continue.
Their state is changed to State::Running and they are unblocked. DBImpl
then signal all incoming writers of that DB to continue by calling
write_thread_.EndWriteStall() (which removes dummy stall object from the
queue).

DB instance creates WBMStallInterface which is an interface to block and
signal DBs during stall.
When DB needs to be blocked or signalled by WriteBufferManager,
state_for_wbm_ state is changed accordingly (RUNNING or BLOCKED).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7898

Test Plan: Added a new test db/db_write_buffer_manager_test.cc

Reviewed By: anand1976

Differential Revision: D26093227

Pulled By: akankshamahajan15

fbshipit-source-id: 2bbd982a3fb7033f6de6153aa92a221249861aae
2021-04-21 13:54:02 -07:00
..
compacted_db_impl.cc Make backups openable as read-only DBs (#8142) 2021-04-06 14:37:53 -07:00
compacted_db_impl.h Move compacted_db_impl.[c|h] to db/db_impl (#8082) 2021-03-23 13:49:26 -07:00
db_impl_compaction_flush.cc Fix possible hang issue in ~DBImpl() when flush is scheduled in LOW pool (#8125) 2021-03-30 18:35:20 -07:00
db_impl_debug.cc Do not set bg error for compaction in retryable IO Error case (#7899) 2021-01-27 17:58:12 -08:00
db_impl_experimental.cc Replace reinterpret_cast with static_cast_with_check (#7067) 2020-07-02 19:25:41 -07:00
db_impl_files.cc Handle rename() failure in non-local FS (#8192) 2021-04-19 18:11:13 -07:00
db_impl_open.cc Handle rename() failure in non-local FS (#8192) 2021-04-19 18:11:13 -07:00
db_impl_readonly.cc Make backups openable as read-only DBs (#8142) 2021-04-06 14:37:53 -07:00
db_impl_readonly.h RocksJava - Add errorIfLogFileExists parameter to RocksDB.openReadOnly (#7046) 2020-09-17 15:41:25 -07:00
db_impl_secondary.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
db_impl_secondary.h RocksJava - Add errorIfLogFileExists parameter to RocksDB.openReadOnly (#7046) 2020-09-17 15:41:25 -07:00
db_impl_write.cc Stall writes in WriteBufferManager when memory_usage exceeds buffer_size (#7898) 2021-04-21 13:54:02 -07:00
db_impl.cc Stall writes in WriteBufferManager when memory_usage exceeds buffer_size (#7898) 2021-04-21 13:54:02 -07:00
db_impl.h Stall writes in WriteBufferManager when memory_usage exceeds buffer_size (#7898) 2021-04-21 13:54:02 -07:00