3b981eaa1d
Summary: In testing for https://github.com/facebook/rocksdb/issues/8225 I found cache_bench would crash with -use_clock_cache, as well as db_bench -use_clock_cache, but not single-threaded. Smaller cache size hits failure much faster. ASAN reported the failuer as calling malloc_usable_size on the `key` pointer of a ClockCache handle after it was reportedly freed. On detailed inspection I found this bad sequence of operations for a cache entry: state=InCache=1,refs=1 [thread 1] Start ClockCacheShard::Unref (from Release, no mutex) [thread 1] Decrement ref count state=InCache=1,refs=0 [thread 1] Suspend before CalcTotalCharge (no mutex) [thread 2] Start UnsetInCache (from Insert, mutex held) [thread 2] clear InCache bit state=InCache=0,refs=0 [thread 2] Calls RecycleHandle (based on pre-updated state) [thread 2] Returns to Insert which calls Cleanup which deletes `key` [thread 1] Resume ClockCacheShard::Unref [thread 1] Read `key` in CalcTotalCharge To fix this, I've added a field to the handle to store the metadata charge so that we can efficiently remember everything we need from the handle in Unref. We must not read from the handle again if we decrement the count to zero with InCache=1, which means we don't own the entry and someone else could eject/overwrite it immediately. Note before this change, on amd64 sizeof(Handle) == 56 even though there are only 48 bytes of data. Grouping together the uint32_t fields would cut it down to 48, but I've added another uint32_t, which takes it back up to 56. Not a big deal. Also fixed DisownData to cooperate with ASAN as in LRUCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8261 Test Plan: Manual + adding use_clock_cache to db_crashtest.py Base performance ./cache_bench -use_clock_cache Complete in 17.060 s; QPS = 2458513 New performance ./cache_bench -use_clock_cache Complete in 17.052 s; QPS = 2459695 Any difference is easily buried in small noise. Crash test shows still more bug(s) in ClockCache, so I'm expecting to disable ClockCache from production code in a follow-up PR (if we can't find and fix the bug(s)) Reviewed By: mrambacher Differential Revision: D28207358 Pulled By: pdillinger fbshipit-source-id: aa7a9322afc6f18f30e462c75dbbe4a1206eb294 |
||
---|---|---|
.. | ||
advisor | ||
block_cache_analyzer | ||
dump | ||
rdb | ||
analyze_txn_stress_test.sh | ||
auto_sanity_test.sh | ||
backup_db.sh | ||
benchmark_leveldb.sh | ||
benchmark.sh | ||
blob_dump.cc | ||
check_all_python.py | ||
check_format_compatible.sh | ||
CMakeLists.txt | ||
db_bench_tool_test.cc | ||
db_bench_tool.cc | ||
db_bench.cc | ||
db_crashtest.py | ||
db_repl_stress.cc | ||
db_sanity_test.cc | ||
dbench_monitor | ||
Dockerfile | ||
generate_random_db.sh | ||
ingest_external_sst.sh | ||
io_tracer_parser_test.cc | ||
io_tracer_parser_tool.cc | ||
io_tracer_parser_tool.h | ||
io_tracer_parser.cc | ||
ldb_cmd_impl.h | ||
ldb_cmd_test.cc | ||
ldb_cmd.cc | ||
ldb_test.py | ||
ldb_tool.cc | ||
ldb.cc | ||
pflag | ||
reduce_levels_test.cc | ||
regression_test.sh | ||
report_lite_binary_size.sh | ||
restore_db.sh | ||
rocksdb_dump_test.sh | ||
run_flash_bench.sh | ||
run_leveldb.sh | ||
sample-dump.dmp | ||
simulated_hybrid_file_system.cc | ||
simulated_hybrid_file_system.h | ||
sst_dump_test.cc | ||
sst_dump_tool.cc | ||
sst_dump.cc | ||
trace_analyzer_test.cc | ||
trace_analyzer_tool.cc | ||
trace_analyzer_tool.h | ||
trace_analyzer.cc | ||
verify_random_db.sh | ||
write_external_sst.sh | ||
write_stress_runner.py | ||
write_stress.cc |