blocking factor on tar..maybe should be more clear in man page? (even info isn't clear)...

View: New views
2 Messages — Rating Filter:   Alert me  

blocking factor on tar..maybe should be more clear in man page? (even info isn't clear)...

by Linda Walsh-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I thought the blocking factor inserted a gap in the file-stream for
synchronization (from reading the info section on tar).

I thought -- if it used a blocking factor, and it inserted a gap into the disk stream, then maybe it might add a few bytes every 20blocks/10K and that
maybe, by using a higher block size, I  would get smaller tar files (by
some tiny amount.
Instead, the smaller the block size, the tinier the tar file.

In fact, looks like for optimal byte size blocksize=1 is best.

So what's the scoop?

My little test using 146 files out of /tmp, showing sizes in bytes shows:
tmp is actual 'dir', other files are named with their blocking factor,
so, tmp-4.tar = -b 4, etc....
files of same len are on same line and abbreviated in w/curly brackets.

3864378:        tmp
3974144:        tmp-{1,2}.tar
3975168:        tmp-4.tar
3977216:        tmp-8.tar
3981312:        tmp-16.tar
3983360:        tmp-{20,default}.tar
3997696:        tmp-{64,128}.tar
4063232:        tmp-256.tar
4194304:        tmp-{512,1024,2048,4096,8192}.tar
8388608:        tmp-16384.tar

I did the tests on tmp-fs to compare cpu:

cpu wise, size=1 took ~ 60-80% longer than size=4;
          size=2 was about 20% slower
          size=16384 took almost as long as size=1 (can you guess about
          what size my CPU cache is?}

When I did the test on an xfs file system, I noticed no difference in
cpu (disk came into play too much)...  

So I'm guessing blocking factor is rounding up file-sizes - or some of the file sizes (maybe if next file won't fit in current 'tarblock'? to the blocking factor?

Well, at least I know not to *INCREASE* the default :-) (even though
decreasing it isn't THAT much of a savings...

Linda





Re: blocking factor on tar..maybe should be more clear in man page? (even info isn't clear)...

by Tim Kientzle :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Linda A. Walsh wrote:
> I thought the blocking factor inserted a gap in the file-stream for
> synchronization (from reading the info section on tar).

No, there's no gap.  Tar simply ensures that each
write to the output archive is exactly the same size.
This is due to tar's legacy as a program that writes
to and reads from tape drives.

Tape drives deal with data in "blocks" that are
similar to the "sectors" of disk drives.  Because
of how tape drives work, you can only read full
blocks at a time, so the block size must be exactly
the same when reading as was used when writing.
(Although GNU tar has been experimenting with
using a large initial read, noting the size of the
block that's actually returned and then using
that size for all subsequent blocks.)

> I thought -- if it used a blocking factor, and it inserted a gap into
> the disk stream, then maybe it might add a few bytes every 20blocks/10K
> and that
> maybe, by using a higher block size, I  would get smaller tar files...

Nope, tar does not insert data between blocks.

When writing to a disk file, the only practical
implication of the block size is that the final
file is rounded up to a multiple of the block
size.  This is usually not a problem for large
archives (a few kilobytes doesn't make much
difference when writing multi-gigabyte backup
files) or archives that are being compressed
(trailing zero bytes compress very well).

With most tar implementations, a larger block
size means each write to disk is larger, which
is usually more efficient.

If the tar program knows it's reading/writing
to disk, then it could (at least in theory)
be a little sloppy with the block handling
to get good performance even with a small
block size.  I don't know offhand if GNU tar
does this.

Tim