Commit Graph

13 Commits

Author SHA1 Message Date
Hui Xiao
3573558ec5 Rewrite memory-charging feature's option API (#9926)
Summary:
**Context:**
Previous PR https://github.com/facebook/rocksdb/pull/9748, https://github.com/facebook/rocksdb/pull/9073, https://github.com/facebook/rocksdb/pull/8428 added separate flag for each charged memory area. Such API design is not scalable as we charge more and more memory areas. Also, we foresee an opportunity to consolidate this feature with other cache usage related features such as `cache_index_and_filter_blocks` using `CacheEntryRole`.

Therefore we decided to consolidate all these flags with `CacheUsageOptions cache_usage_options` and this PR serves as the first step by consolidating memory-charging related flags.

**Summary:**
- Replaced old API reference with new ones, including making `kCompressionDictionaryBuildingBuffer` opt-out and added a unit test for that
- Added missing db bench/stress test for some memory charging features
- Renamed related test suite to indicate they are under the same theme of memory charging
- Refactored a commonly used mocked cache component in memory charging related tests to reduce code duplication
- Replaced the phrases "memory tracking" / "cache reservation" (other than CacheReservationManager-related ones) with "memory charging" for standard description of this feature.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9926

Test Plan:
- New unit test for opt-out `kCompressionDictionaryBuildingBuffer` `TEST_F(ChargeCompressionDictionaryBuildingBufferTest, Basic)`
- New unit test for option validation/sanitization `TEST_F(CacheUsageOptionsOverridesTest, SanitizeAndValidateOptions)`
- CI
- db bench (in case querying new options introduces regression) **+0.5% micros/op**: `TEST_TMPDIR=/dev/shm/testdb ./db_bench -benchmarks=fillseq -db=$TEST_TMPDIR  -charge_compression_dictionary_building_buffer=1(remove this for comparison)  -compression_max_dict_bytes=10000 -disable_auto_compactions=1 -write_buffer_size=100000 -num=4000000 | egrep 'fillseq'`

#-run | (pre-PR) avg micros/op | std micros/op | (post-PR)  micros/op | std micros/op | change (%)
-- | -- | -- | -- | -- | --
10 | 3.9711 | 0.264408 | 3.9914 | 0.254563 | 0.5111933721
20 | 3.83905 | 0.0664488 | 3.8251 | 0.0695456 | **-0.3633711465**
40 | 3.86625 | 0.136669 | 3.8867 | 0.143765 | **0.5289363078**

- db_stress: `python3 tools/db_crashtest.py blackbox  -charge_compression_dictionary_building_buffer=1 -charge_filter_construction=1 -charge_table_reader=1 -cache_size=1` killed as normal

Reviewed By: ajkr

Differential Revision: D36054712

Pulled By: hx235

fbshipit-source-id: d406e90f5e0c5ea4dbcb585a484ad9302d4302af
2022-05-17 15:01:51 -07:00
Hui Xiao
74cfe7db60 Refactor WriteBufferManager::CacheRep into CacheReservationManager (#8506)
Summary:
Context:
To help cap various memory usage by a single limit of the block cache capacity, we charge the memory usage through inserting/releasing dummy entries in the block cache. CacheReservationManager is such a class (non thread-safe) responsible for  inserting/removing dummy entries to reserve cache space for memory used by the class user.

- Refactored the inner private class CacheRep of WriteBufferManager into public CacheReservationManager class for reusability such as for https://github.com/facebook/rocksdb/pull/8428

- Encapsulated implementation details of cache key generation and dummy entries insertion/release in cache reservation as discussed in https://github.com/facebook/rocksdb/pull/8506#discussion_r666550838

- Consolidated increase/decrease cache reservation into one API - UpdateCacheReservation.

- Adjusted the previous dummy entry release algorithm in decreasing cache reservation to be loop-releasing dummy entries to stay symmetric to dummy entry insertion algorithm

- Made the previous dummy entry release algorithm in delayed decrease mode more aggressive for better decreasing cache reservation when memory used is less likely to increase back.

  Previously, the algorithms only release 1 dummy entries when new_mem_used < 3/4 * cache_allocated_size_ and cache_allocated_size_ - kSizeDummyEntry > new_mem_used.
Now, the algorithms loop-releases as many dummy entries as possible when new_mem_used < 3/4 * cache_allocated_size_.

- Updated WriteBufferManager's test cases to adapt to changes on the release algorithm mentioned above and left comment for some test cases for clarity

- Replaced the previous cache key prefix generation (utilizing object address related to the cache client) with one that utilizes Cache->NewID() to prevent cache-key collision among dummy entry clients sharing the same cache.

  The specific collision we are preventing happens when the object address is reused for a new cache-key prefix while the old cache-key using that same object address in its prefix still exists in the cache. This could happen due to that, under LRU cache policy, there is a possible delay in releasing a cache entry after the cache client object owning that cache entry get deallocated. In this case, the object address related to the cache client object can get reused for other client object to generate a new cache-key prefix.

  This prefix generation can be made obsolete after Peter's unification of all the code generating cache key, mentioned in https://github.com/facebook/rocksdb/pull/8506#discussion_r667265255

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8506

Test Plan:
- Passing the added unit tests cache_reservation_manager_test.cc
- Passing existing and adjusted write_buffer_manager_test.cc

Reviewed By: ajkr

Differential Revision: D29644135

Pulled By: hx235

fbshipit-source-id: 0fc93fbfe4a40bb41be85c314f8f2bafa8b741f7
2021-08-24 12:43:31 -07:00
matthewvon
4126bdc0e1 Feature: add SetBufferSize() so that managed size can be dynamic (#7961)
Summary:
This PR adds SetBufferSize() to the WriteBufferManager object.  This enables user code to adjust the global budget for write_buffers based upon other memory conditions such as growth in table reader memory as the dataset grows.

The buffer_size_ member variable is now atomic to match design of other changeable size_t members within WriteBufferManager.

This change is useful as is.  However, this change is also essential if someone decides they wanted to enable db_write_buffer_size modifications through the DB::SetOptions() API, i.e. no waste taking this as is.

Any format / spacing changes are due to clang-format as required by check-in automation.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7961

Reviewed By: ajkr

Differential Revision: D26639075

Pulled By: akankshamahajan15

fbshipit-source-id: 0604348caf092d35f44e85715331dc920e5c1033
2021-03-03 14:22:11 -08:00
mrambacher
12f1137355 Add a SystemClock class to capture the time functions of an Env (#7858)
Summary:
Introduces and uses a SystemClock class to RocksDB.  This class contains the time-related functions of an Env and these functions can be redirected from the Env to the SystemClock.

Many of the places that used an Env (Timer, PerfStepTimer, RepeatableThread, RateLimiter, WriteController) for time-related functions have been changed to use SystemClock instead.  There are likely more places that can be changed, but this is a start to show what can/should be done.  Over time it would be nice to migrate most (if not all) of the uses of the time functions from the Env to the SystemClock.

There are several Env classes that implement these functions.  Most of these have not been converted yet to SystemClock implementations; that will come in a subsequent PR.  It would be good to unify many of the Mock Timer implementations, so that they behave similarly and be tested similarly (some override Sleep, some use a MockSleep, etc).

Additionally, this change will allow new methods to be introduced to the SystemClock (like https://github.com/facebook/rocksdb/issues/7101 WaitFor) in a consistent manner across a smaller number of classes.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7858

Reviewed By: pdillinger

Differential Revision: D26006406

Pulled By: mrambacher

fbshipit-source-id: ed10a8abbdab7ff2e23d69d85bd25b3e7e899e90
2021-01-25 22:09:11 -08:00
Akanksha Mahajan
8ed680bdb0 Add new API to report dummy entries size in cache in WriteBufferManager (#7837)
Summary:
Add new API WriteBufferManager::dummy_entries_in_cache_usage() which reports the dummy entries size stored in cache to account for DataBlocks in WriteBufferManager.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7837

Test Plan: Updated test ./write_buffer_manager_test

Reviewed By: ajkr

Differential Revision: D25794312

Pulled By: akankshamahajan15

fbshipit-source-id: 197f5e8701e3dc57a7df72dab1735624f90daf4b
2021-01-08 13:26:24 -08:00
sdong
57096ab13e Fix a bug that crashes the service when write buffer manager fails to insert to block cache (#6619)
Summary:
https://github.com/facebook/rocksdb/issues/6247 reports that when write buffer manager fails to insert the dummy entry to block cache, null pointer is still stored and used to release the handle and cause corruption. Fix the bug by not releasing it with null handle.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6619

Test Plan: Add a unit test that fails without the fix.

Reviewed By: ajkr

Differential Revision: D20776769

fbshipit-source-id: 4127fbd9f295a0a3e45774746ffcd91f939f6287
2020-04-01 11:27:40 -07:00
sdong
fdf882ded2 Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433)
Summary:
When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433

Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.

Differential Revision: D19977691

fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
2020-02-20 12:09:57 -08:00
Maysam Yabandeh
638d239507 Charge block cache for cache internal usage (#5797)
Summary:
For our default block cache, each additional entry has extra memory overhead. It include LRUHandle (72 bytes currently) and the cache key (two varint64, file id and offset). The usage is not negligible. For example for block_size=4k, the overhead accounts for an extra 2% memory usage for the cache. The patch charging the cache for the extra usage, reducing untracked memory usage outside block cache. The feature is enabled by default and can be disabled by passing kDontChargeCacheMetadata to the cache constructor.
This PR builds up on https://github.com/facebook/rocksdb/issues/4258
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5797

Test Plan:
- Existing tests are updated to either disable the feature when the test has too much dependency on the old way of accounting the usage or increasing the cache capacity to account for the additional charge of metadata.
- The Usage tests in cache_test.cc are augmented to test the cache usage under kFullChargeCacheMetadata.

Differential Revision: D17396833

Pulled By: maysamyabandeh

fbshipit-source-id: 7684ccb9f8a40ca595e4f5efcdb03623afea0c6f
2019-09-16 15:26:21 -07:00
Siying Dong
e9e0101ca4 Move test related files under util/ to test_util/ (#5377)
Summary:
There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo
ve them to a new directory test_util/, so that util/ is cleaner.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377

Differential Revision: D15551366

Pulled By: siying

fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c
2019-05-30 11:25:51 -07:00
Siying Dong
beb44ec3eb WriteBufferManager's dummy entry size to block cache 1MB -> 256KB (#5175)
Summary:
Dummy cache size of 1MB is too large for small block sizes. Our GetDefaultCacheShardBits() use min_shard_size = 512L * 1024L to determine number of shards, so 1MB will excceeds the size of the whole shard and make the cache excceeds the budget.
Change it to 256KB accordingly.
There shouldn't be obvious performance impact, since inserting a cache entry every 256KB of memtable inserts is still infrequently enough.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5175

Differential Revision: D14954289

Pulled By: siying

fbshipit-source-id: 2c275255c1ac3992174e06529e44c55538325c94
2019-04-16 12:03:07 -07:00
Siying Dong
3c327ac2d0 Change RocksDB License
Summary: Closes https://github.com/facebook/rocksdb/pull/2589

Differential Revision: D5431502

Pulled By: siying

fbshipit-source-id: 8ebf8c87883daa9daa54b2303d11ce01ab1f6f75
2017-07-15 16:11:23 -07:00
Siying Dong
af1746751f WriteBufferManager will not trigger flush if much data is already being flushed
Summary:
Even if hard limit hits, flushing more memtable may not help cap the memory usage if already more than half data is scheduled for flush. Not triggering flush instead.
Closes https://github.com/facebook/rocksdb/pull/2469

Differential Revision: D5284249

Pulled By: siying

fbshipit-source-id: 8ab7ba1aba56a634dbe72b318fcab2093063972e
2017-06-21 10:41:37 -07:00
Siying Dong
95b0e89b5d Improve write buffer manager (and allow the size to be tracked in block cache)
Summary:
Improve write buffer manager in several ways:
1. Size is tracked when arena block is allocated, rather than every allocation, so that it can better track actual memory usage and the tracking overhead is slightly lower.
2. We start to trigger memtable flush when 7/8 of the memory cap hits, instead of 100%, and make 100% much harder to hit.
3. Allow a cache object to be passed into buffer manager and the size allocated by memtable can be costed there. This can help users have one single memory cap across block cache and memtable.
Closes https://github.com/facebook/rocksdb/pull/2350

Differential Revision: D5110648

Pulled By: siying

fbshipit-source-id: b4238113094bf22574001e446b5d88523ba00017
2017-06-02 14:26:56 -07:00