Summary:
The global atomics we previously used for tickers had poor cache performance
since they were typically updated from different threads, causing frequent
invalidations. In this diff,
- recordTick() updates a local ticker value specific to the thread in which it was called
- When a thread exits, its local ticker value is added into merged_sum
- getTickerCount() returns the sum of all threads' local ticker values and the merged_sum
- setTickerCount() resets all threads' local ticker values and sets merged_sum to the value provided by the caller.
In a next diff I will make a similar change for histogram stats.
Test Plan:
before:
$ TEST_TMPDIR=/dev/shm/ perf record -g ./db_bench --benchmarks=readwhilewriting --statistics --num=1000000 --use_existing_db --threads=64 --cache_size=250000000 --compression_type=lz4
$ perf report -g --stdio | grep recordTick
7.59% db_bench db_bench [.] rocksdb::StatisticsImpl::recordTick
...
after:
$ TEST_TMPDIR=/dev/shm/ perf record -g ./db_bench --benchmarks=readwhilewriting --statistics --num=1000000 --use_existing_db --threads=64 --cache_size=250000000 --compression_type=lz4
$ perf report -g --stdio | grep recordTick
1.46% db_bench db_bench [.] rocksdb::StatisticsImpl::recordTick
...
Reviewers: kradhakrishnan, MarkCallaghan, IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: yiwu, andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D62337