Most file systems try hard to keep files contiguous, but this can be
very difficult when they don't know how big the file will be. The result
is a lot of disk fragmentation, especially when lots of files are being
written at the same time or when the file system gets full.
A few years ago, Linux added a new system call, fallocate(), for
optional use by any application that knows in advance how big a file
fallocate() is implemented at the file system level, and not every Linux
file system supports it. Ext3 does not, but several other important ones
do, including xfs and ext4. It's especially useful with XFS and other
extent-based file systems.
Calling fallocate() on a file system that doesn't support it simply
returns -1 with no harm done. And nothing keeps you from writing past
the region specified by fallocate() although the extra space may not be
contiguous with the previously reserved space.
It seems to me that file de-archivers like unzip, tar, cpio and pax, as
well as remote file copy programs like rsync, really ought to invoke
fallocate() when available.
A successful fallocate() call also guarantees that writes within the
allocated area won't fail. This seems like another very useful feature
in that a file extraction utility could skip large files for which there
isn't any room instead of ungracefully running the file system out of
space and then deleting the partial copy. I suppose there's room for
debate as to whether the program ought to simply quit when fallocate()
fails or skip the file and continue to extract smaller files that do fit.
And if you really want to go all-out, if you know how much space will be
needed by all of the files you're extracting you could provide the
option of allocating it all in advance and and aborting without reading
anything if there isn't room for it all.
If I come up with some patches to put fallocate() into cpio, would the
upstream maintainers be willing to adopt them?