Dhruba Borthakur 95dda37858 Move filesize-based-sorting to outside the Mutex
Summary:
When a new version is created, we sort all the files at every
level based on their size. This is necessary because we want
to compact the largest file first. The sorting takes quite a
bit of CPU.

Moved the sorting code to be outside the mutex. Also, the
earlier code was sorting files at all levels but we do not
need to sort the highest-number level because those files
are never the cause of any compaction. To reduce sorting
costs, we sort only the first few files in each level
because it is likely that those are the only files in that
level that will be picked for compaction.

At steady state, I have seen that this patch increase
throughout from 1500 writes/sec to 1700 writes/sec at the
end of a 72 hour run. The cpu saving by not sorting the
last level was not distinctive in this test run because
there were only 100K files in the highest numbered level.
I expect the cpu saving to be significant when the number of
files is much higher.

This is mostly an early preview and not ready for rigorous review.

With this patch, the writs/sec is now bottlenecked not by the sorting code but by GetOverlappingInputs. I am working on a patch to optimize GetOverlappingInputs.

Test Plan: make check

Reviewers: MarkCallaghan, heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D6411
2012-11-07 15:39:44 -08:00
..
2011-10-31 17:22:06 +00:00
2011-10-31 17:22:06 +00:00
2011-10-31 17:22:06 +00:00
2012-09-16 19:33:43 -07:00
2012-04-17 08:36:46 -07:00
2012-09-06 17:52:08 -07:00
2012-09-06 17:52:08 -07:00
2011-10-31 17:22:06 +00:00
2011-10-31 17:22:06 +00:00
2011-10-31 17:22:06 +00:00
2011-10-31 17:22:06 +00:00
2011-10-31 17:22:06 +00:00