only FALLOC_FL_PUNCH_HOLE when ftruncate is buggy

Summary:
In RocksDB, we sometimes preallocate the estimated space for a file to have better perf with fallocate (if supported). Usually it is a little bit bigger than the real resulting file size. At this time, we have to let the Filesystem reclaim the space not used.

Ideally, calling ftruncate to truncate the file to its real size should be enough. HOWEVER, it isn't on tmpfs, which we witness in our case, with some buggy kernel version. ftruncate a file with preallocated space doesn't change number of the blocks used by the file, which means the space not used by the file is not returned to the filesystems. So in this case we need fallocate with FALLOC_FL_PUNCH_HOLE to explicitly reclaim the used blocks. It is a hack to cope with the kernel bug and usually we should not need it.
Closes https://github.com/facebook/rocksdb/pull/2102

Differential Revision: D4848934

Pulled By: lightmark

fbshipit-source-id: f1b40b5
This commit is contained in:
Aaron Gao 2017-04-06 18:08:53 -07:00 committed by Facebook Github Bot
parent 343b59d6ee
commit 9e72939029

20
env/io_posix.cc vendored
View File

@ -768,10 +768,22 @@ Status PosixWritableFile::Close() {
// of the file, but it does. Simple strace report will show that.
// While we work with Travis-CI team to figure out if this is a
// quirk of Docker/AUFS, we will comment this out.
IOSTATS_TIMER_GUARD(allocate_nanos);
if (allow_fallocate_) {
fallocate(fd_, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, filesize_,
block_size * last_allocated_block - filesize_);
struct stat file_stats;
fstat(fd_, &file_stats);
// After ftruncate, we check whether ftruncate has the correct behavior.
// If not, we should hack it with FALLOC_FL_PUNCH_HOLE
if ((file_stats.st_size + file_stats.st_blksize - 1) /
file_stats.st_blksize !=
file_stats.st_blocks / (file_stats.st_blksize / 512)) {
fprintf(stderr,
"Your kernel is buggy (<= 4.0.x) and does not free preallocated"
"blocks on truncate. Hacking around it, but you should upgrade!"
"\n");
IOSTATS_TIMER_GUARD(allocate_nanos);
if (allow_fallocate_) {
fallocate(fd_, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, filesize_,
block_size * last_allocated_block - filesize_);
}
}
#endif
}