cpio suggestion: --quick-exit

View: New views
3 Messages — Rating Filter:   Alert me  

cpio suggestion: --quick-exit

by Carl Sopchak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

This is a suggestion for a new option to cpio, --quick-exit.  The purpose of
the option is to exit cpio as soon as the requested file(s) have been
restored, instead of reading all the way through the archive.  The theory
behind it is that the archive is more or less organized by directory, and
once you pass the end of a directory in the archive, it won't appear again.  
This obviously assumes that the --append option wasn't used.  If it was with
the archive (or if you didn't know), you would not use the --quick-exit
option.  Also, it would only be useful if a wildcard character in the file(s)
to be restored were at least one directory down from the root of the archive,
or if there were no wildcard characters at all.

For example, if cpio was invoked with
        cpio --extract --quick-exit "home/carl/*"
cpio would scan the archive until it came to the home/carl directory, restore
all of the contents of that directly, then immediately exit.

To be honest, I'm kinda suprised that this hasn't been though of before.  Is
there a reason why this would be impractical to implement?

I would consider implementing this option, as it would benefit me, but I would
need someone to point me in the right direction...

Thanks for the consideration,

Carl



Re: cpio suggestion: --quick-exit

by Tim Kientzle :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Carl Sopchak wrote:
> Hi,
>
> This is a suggestion for a new option to cpio, --quick-exit.  The purpose of
> the option is to exit cpio as soon as the requested file(s) have been
> restored, instead of reading all the way through the archive.

Perhaps it would be better to implement the GNU tar --occurrence
option:

     --occurrence[=NUMBER]
        process only the NUMBERth occurrence of each file
        in the archive; this option is valid only in
        conjunction with one of the subcommands --delete,
        --diff, --extract or --list and when a list of
        files is given either on the command line or via
        the -T option; NUMBER defaults to 1

or the older --fast-read aka -q supported by FreeBSD's
tar (which used to be a patched GNU tar, since replaced
by bsdtar):

    -q (--fast-read)
        (x and t mode only) Extract or list only the first archive entry
        that matches each pattern or filename operand.  Exit as soon as
        each specified pattern or filename has been matched.  By default,
        the archive is always read to the very end, since there can be
        multiple entries with the same name and, by convention, later
        entries overwrite earlier entries.  This option is provided as a
        performance optimization.

Your notion of allowing this to work for specifying a directory
(exiting as soon as something outside of the directory is seen)
sounds rife for confusion, though.  Besides append operation,
it's a little dangerous to make strong assumptions about the
order in which items are written into archives.  (Witness the
very different orders that GNU tar, star, and bsdtar use
for visiting directories.)

Tim



Re: cpio suggestion: --quick-exit

by Carl Sopchak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Tim,

Thanks for the feedback.  

I don't think --occurrence is equivalent to what I'm trying to achieve, as it
sounds like cpio would still have to read the entire archive (which is what
I'm trying to avoid) if a wildcard is specified.

It sounds like my --quick-exit is of the same vein as --fast-read.  However,
the description provided sounds to me (I'm no expert by any means) like if I
requested a restore of "home/carl/*" using --fast-read then I'd only get the
first file matched (not the entire subtree) which is not what I'm looking
for.  If this is an incorrect interpretation, then I think the two are the
same, and using --fast-read instead of --quick-exit in cpio is fine with me.  
(Although, IMHO, --quick-exit more aptly describes its function...)

I am also not familiar with all of the flavors of tar, so I do not know how
directories are visited in each.  But then again, cpio doesn't even do tree
traversal (you have to feed it file names to back up).  This option could be
available for archives that have been purposefully structured to take
advantage of it, like my full and daily backups.

I'm somewhat baffled by the "rife for confusion" remark.  I think it could be
explained quite suscinctly:  "If a wildcard is used to specify the files to
be restored and --quick-exit [for --fast-read] is specified, the archive will
be read until the first file matches.  Subsequent files will be restored
until the path up to the wildcard specification changes in the archive, at
which point cpio will exit unless there are other paths specified on the
command line that have not been restored.  For example..."

Is there a design philosophy for cpio that it remain as close to tar in
command line options and mode of operation as possible?  I realize the
benefit of this consistency, but they are different programs, after all...

Anyway, it sounds like this is a do-able change.  Any other comments,
suggestions, or pointers as to how I might proceed?

Thanks,

Carl

On Saturday, August 22, 2009, Tim Kientzle wrote:

> Carl Sopchak wrote:
> > Hi,
> >
> > This is a suggestion for a new option to cpio, --quick-exit.  The purpose
> > of the option is to exit cpio as soon as the requested file(s) have been
> > restored, instead of reading all the way through the archive.
>
> Perhaps it would be better to implement the GNU tar --occurrence
> option:
>
>      --occurrence[=NUMBER]
>         process only the NUMBERth occurrence of each file
>         in the archive; this option is valid only in
>         conjunction with one of the subcommands --delete,
>         --diff, --extract or --list and when a list of
>         files is given either on the command line or via
>         the -T option; NUMBER defaults to 1
>
> or the older --fast-read aka -q supported by FreeBSD's
> tar (which used to be a patched GNU tar, since replaced
> by bsdtar):
>
>     -q (--fast-read)
>         (x and t mode only) Extract or list only the first archive entry
>         that matches each pattern or filename operand.  Exit as soon as
>         each specified pattern or filename has been matched.  By default,
>         the archive is always read to the very end, since there can be
>         multiple entries with the same name and, by convention, later
>         entries overwrite earlier entries.  This option is provided as a
>         performance optimization.
>
> Your notion of allowing this to work for specifying a directory
> (exiting as soon as something outside of the directory is seen)
> sounds rife for confusion, though.  Besides append operation,
> it's a little dangerous to make strong assumptions about the
> order in which items are written into archives.  (Witness the
> very different orders that GNU tar, star, and bsdtar use
> for visiting directories.)
>
> Tim