|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
tar + lbzip2 proposalDear GNU Tar Maintainer,
here's an idea to add lbzip2 and other parallel bzip2 implementation support to GNU Tar. I'm asking for your opinion. I'm willing to implement the suggested functionality if you accept the proposal (after necessary amendments, of course). In my preliminary understanding, the name of the compression program to use is pointed to by the "use_compress_program_option" global variable. This variable can be set up in a multitude of ways: 1 -j / --bzip2 and the like set it up by a call to set_use_compress_program_option() with a fixed argument -- specifying more than one distinct values via this function (eg. with -j -z) makes tar exit with an error. 2 -I / --use-compress-program allows the user to specify the argument to set_use_compress_program_option() directly. 3 -a / --auto-compress selects the compression program at archive creation time from the name suffix of the file-to-be-created, by calling set_compression_program_by_suffix(). If this attempt fails because of an unknown suffix, then tar doesn't override a compression program specified otherwise (see 1 and 2 above). Thus, when creating an archive, if both -j (or --use=bzip2) and -a are specified, and -f has argument "file.tar.gz", then -a takes precendece and gzip will be selected. If -f has argument 'file.tar.qqq', then -j takes effect. 4 If the user didn't specify a compression program via methods 1 or 2, then at testing/extraction time tar selects the compression program according to the magic signature stored in the file. If that fails, tar falls back to the suffix-based method. open_compressed_archive() -> compress_program() -> magic[].program -> set_compression_program_by_suffix() -> find_compression_program() -> compression_suffixes[].program This list is possibly incomplate and/or inaccurate. It would be important to identify all write access sites to "use_compress_program_option"; please verify the list! Thank you. The array "compression_suffixes" could be static, I think, just like "magic" is. In general, --use-compress-program cannot be added to TAR_OPTIONS. Proposal: * Introduce new global variable "bzip2_filter", with default value "bzip2". The variable has type "const char *". * Introduce new command line option "--bzip-filter" to change the value of the variable "bzip2_filter". Thus the options requires an argument. The option can be passed only once on the command line and only before setting "use_compress_program_option" in any way. * The character array pointed to by "bzip2_filter" lives in either static storage (default "bzip2") or automatic storage (parameter to main()). It can't be modified or freed. * Modify case 1 (-j / --bzip2) to pass the value of "bzip2_filter" to set_use_compress_program_option(), instead of a fixed "bzip2" string. * Case 2 is unchanged. * Change the compress_program() macro definition into a real static function that handles the bz2 magic value as an exception, and returns the value of "bzip2_filter". The strings currently returned by compress_program() from magic[] also have static storage class. * Change set_compression_program_by_suffix() to handle the bzip2 suffixes as exceptions, and to return the value of "bzip2_filter". The strings currently returned by this function from compression_suffixes[] also have static storage class. * Due to the last two points, the auto-selection methods in 3 and 4 will use the program passed by --bzip-filter (or per default bzip2) where bzip2 is auto-selected now. * As development advances, more and more multi-threaded alternaties might be added to tar, with --gzip-filter for pigz, for example. Once the exceptions in compress_program() and set_compression_program_by_suffix() start to proliferate, flat tables would become desirable again, ie. extending the current magic[] and compression_suffixes[] arrays with pointers to global variables, each holding the selected alternative for that family of compression. Maybe this is the preferred way to start out with even now. * Usage: user prepends "--bzip2-filter=lbzip2" to her TAR_OPTIONS. * On Debian, the tar source could be patched, so that "bzip2_filter" defaults to "/etc/alternatives/bzip2-filter", which would be a symlink to /bin/bzip2 per default. Packages like "lbzip2" and "pbzip2" would add alternatives. I'm greatly interested in your opinion, thanks, lacos |
|
|
Re: tar + lbzip2 proposalOn Tue, 6 Oct 2009, ERSEK Laszlo wrote:
> Proposal: > > * Usage: user prepends "--bzip2-filter=lbzip2" to her TAR_OPTIONS. > > * On Debian, the tar source could be patched, so that "bzip2_filter" > defaults to "/etc/alternatives/bzip2-filter", which would be a symlink to > /bin/bzip2 per default. Packages like "lbzip2" and "pbzip2" would add > alternatives. * User sets specific options for her choice of bzip2-filter by setting appropriate environment variables. For example, export LBZIP2_WORKER_THREADS=2 export LBZIP2_PRINT_STATS=1 Thanks, lacos |
|
|
Re: tar + lbzip2 proposalHi Laszlo,
> Proposal: The proposal is interesting, but the basic question is: does lbzip2 offer the same functionality as bzip2? If so, why would one bother to have both bzip2 and lbzip2 on his box? Instead, one can simply remove the former and make a symlink lbzip2-> bzip2. In that case no changes are necessary at all. What do you think? Regards, Sergey |
|
|
Re: tar + lbzip2 proposalOn Tue, 6 Oct 2009, Sergey Poznyakoff wrote:
> Hi Laszlo, > The proposal is interesting, but the basic question is: does lbzip2 > offer the same functionality as bzip2? It doesn't. lbzip2 isn't a drop-in replacement for bzip2 (and as far as I can tell, it never will be). Cheers, lacos $ lbzip2 -h lbzip2: Parallel bzip2 filter. Copyright (C) 2008, 2009 Laszlo Ersek. Released under the GNU GPLv2+. Version 0.15. Usage: 1. lbzip2 [-d] [-n WORKER-THREADS] [-v] [-t] 2. lbzip2 -h Options: -d : Decompress. -n WORKER-THREADS : Set number of (de)compressor threads to WORKER-THREADS. WORKER-THREADS must be in [1, 1073741823]. If this option is not specified, the environment variable LBZIP2_WORKER_THREADS is consulted. If LBZIP2_WORKER_THREADS is unset or empty, lbzip2 queries the system for the number of online processors. -v : Print condition variable statistics in the end. If this option is not specified, the environment variable LBZIP2_PRINT_STATS is consulted: statistics will be printed if and only if LBZIP2_PRINT_STATS has a non-empty value. -t : Print memory allocation trace. If this option is not specified, the environment variable LBZIP2_TRACE_ALLOC is consulted: allocation trace will be printed if and only if LBZIP2_TRACE_ALLOC has a non-empty value. Check trace with "malloc_trace.pl". -h : Print this help and exit successfully. |
|
|
Re: tar + lbzip2 proposalERSEK Laszlo <lacos@...> ha escrit:
> It doesn't. lbzip2 isn't a drop-in replacement for bzip2. OK. Then, please find attached a patch that implements an additional level of indirection between compression type and actual compression program name. E.g. to invoke lbzip2 whenever the user types the --bzip2 option, one would use: tar --standard-compress-program=bzip2:lbzip2 ... Of course, it is preferable to add this option to TAR_OPTIONS. That being said, I am still not convinced that the gain compensates additional complexity introduced in the program. Your feedback is, as always, welcome. Regards, Sergey diff --git a/src/buffer.c b/src/buffer.c index dd97682..6e9221d 100644 --- a/src/buffer.c +++ b/src/buffer.c @@ -65,7 +65,7 @@ FILE *stdlis; static void backspace_output (void); -/* PID of child program, if compress_option or remote archive access. */ +/* PID of child program, if compression_option or remote archive access. */ static pid_t child_pid; /* Error recovery stuff */ @@ -197,44 +197,31 @@ compute_duration () /* Compression detection */ -enum compress_type { - ct_tar, /* Plain tar file */ - ct_none, /* Unknown compression type */ - ct_compress, - ct_gzip, - ct_bzip2, - ct_lzma, - ct_lzop, - ct_xz -}; - struct zip_magic { - enum compress_type type; + enum compression_type type; size_t length; char *magic; - char *program; char *option; }; static struct zip_magic const magic[] = { - { ct_tar }, - { ct_none, }, - { ct_compress, 2, "\037\235", "compress", "-Z" }, - { ct_gzip, 2, "\037\213", "gzip", "-z" }, - { ct_bzip2, 3, "BZh", "bzip2", "-j" }, - { ct_lzma, 6, "\xFFLZMA", "lzma", "--lzma" }, /* FIXME: ???? */ - { ct_lzop, 4, "\211LZO", "lzop", "--lzop" }, - { ct_xz, 6, "\0xFD7zXZ", "-J" }, + { compression_tar }, + { compression_none, }, + { compression_compress, 2, "\037\235", "-Z" }, + { compression_gzip, 2, "\037\213", "-z" }, + { compression_bzip2, 3, "BZh", "-j" }, + { compression_lzma, 6, "\xFFLZMA", "--lzma" }, /* FIXME: ???? */ + { compression_lzop, 4, "\211LZO", "--lzop" }, + { compression_xz, 6, "\0xFD7zXZ", "-J" }, }; #define NMAGIC (sizeof(magic)/sizeof(magic[0])) -#define compress_option(t) magic[t].option -#define compress_program(t) magic[t].program +#define compression_option(t) magic[t].option /* Check if the file ARCHIVE is a compressed archive. */ -enum compress_type +enum compression_type check_compressed_archive (bool *pshort) { struct zip_magic const *p; @@ -256,13 +243,13 @@ check_compressed_archive (bool *pshort) if (tar_checksum (record_start, true) == HEADER_SUCCESS) /* Probably a valid header */ - return ct_tar; + return compression_tar; for (p = magic + 2; p < magic + NMAGIC; p++) if (memcmp (record_start->buffer, p->magic, p->length) == 0) return p->type; - return ct_none; + return compression_none; } /* Guess if the archive is seekable. */ @@ -312,25 +299,26 @@ open_compressed_archive () if (!use_compress_program_option) { bool shortfile; - enum compress_type type = check_compressed_archive (&shortfile); + enum compression_type type = check_compressed_archive (&shortfile); switch (type) { - case ct_tar: + case compression_tar: if (shortfile) ERROR ((0, 0, _("This does not look like a tar archive"))); return archive; - case ct_none: + case compression_none: if (shortfile) ERROR ((0, 0, _("This does not look like a tar archive"))); - set_comression_program_by_suffix (archive_name_array[0], NULL); + set_comression_program_by_suffix (archive_name_array[0], + compression_tar); if (!use_compress_program_option) return archive; break; default: - use_compress_program_option = compress_program (type); + use_compress_program_option = compression_to_program_name (type); break; } } @@ -559,15 +547,15 @@ _open_archive (enum access_mode wanted_access) case ACCESS_READ: { bool shortfile; - enum compress_type type; + enum compression_type type; archive = STDIN_FILENO; type = check_compressed_archive (&shortfile); - if (type != ct_tar && type != ct_none) + if (type != compression_tar && type != compression_none) FATAL_ERROR ((0, 0, _("Archive is compressed. Use %s option"), - compress_option (type))); + compression_option (type))); if (shortfile) ERROR ((0, 0, _("This does not look like a tar archive"))); } @@ -616,8 +604,8 @@ _open_archive (enum access_mode wanted_access) switch (check_compressed_archive (NULL)) { - case ct_none: - case ct_tar: + case compression_none: + case compression_tar: break; default: diff --git a/src/common.h b/src/common.h index 0020f08..f39d594 100644 --- a/src/common.h +++ b/src/common.h @@ -782,7 +782,22 @@ bool transform_name_fp (char **pinput, int type, char *(*fun)(char *, void *), void *); /* Module suffix.c */ -void set_comression_program_by_suffix (const char *name, const char *defprog); +enum compression_type + { + compression_tar, /* Plain tar file */ + compression_none, /* Unknown compression type */ + compression_compress, + compression_gzip, + compression_bzip2, + compression_lzma, + compression_lzop, + compression_xz + }; + +void set_comression_program_by_suffix (const char *name, + const char *defname); +const char *compression_to_program_name (enum compression_type type); +void set_compression_program_name (char *arg); /* Module checkpoint.c */ void checkpoint_compile_action (const char *str); @@ -834,3 +849,5 @@ void finish_deferred_unlinks (void); /* Module exit.c */ extern void (*fatal_exit_hook) (void); + + diff --git a/src/suffix.c b/src/suffix.c index 6dbc68e..155f2d7 100644 --- a/src/suffix.c +++ b/src/suffix.c @@ -18,16 +18,71 @@ #include <system.h> #include "common.h" +#include <argmatch.h> +static const char *compression_program[] = { + NULL, + NULL, + "compress", + "gzip", + "bzip2", + "lzma", + "lzop", + "xz", +}; + +const char * +compression_to_program_name (enum compression_type type) +{ + return compression_program[type]; +} + +static const const char *compression_names[] = { + "compress", + "gzip", + "bzip2", + "lzma", + "lzop", + "xz", + NULL +}; + +static int compression_types[] = { + compression_compress, + compression_gzip, + compression_bzip2, + compression_lzma, + compression_lzop, + compression_xz, +}; + +ARGMATCH_VERIFY (compression_names, compression_types); + +void +set_compression_program_name (char *arg) +{ + int t; + char *p = strchr (arg, ':'); + if (!p) + USAGE_ERROR ((0, 0, _("Invalid value for --standard-compress-program"))); + + *p++ = 0; + t = XARGMATCH ("--standard-compress-program", arg, + compression_names, compression_types); + compression_program[t] = p; +} + + struct compression_suffix { const char *suffix; size_t length; - const char *program; + enum compression_type type; }; -struct compression_suffix compression_suffixes[] = { -#define S(s,p) #s, sizeof (#s) - 1, #p +static struct compression_suffix compression_suffixes[] = { +#define __tar_cat2__(a,b) a ## b +#define S(s,p) #s, sizeof (#s) - 1, __tar_cat2__(compression_,p) { S(gz, gzip) }, { S(tgz, gzip) }, { S(taz, gzip) }, @@ -47,8 +102,8 @@ struct compression_suffix compression_suffixes[] = { int nsuffixes = sizeof (compression_suffixes) / sizeof (compression_suffixes[0]); -static const char * -find_compression_program (const char *name, const char *defprog) +static enum compression_type +archive_name_to_compression_type (const char *name) { char *suf = strrchr (name, '.'); @@ -64,16 +119,19 @@ find_compression_program (const char *name, const char *defprog) { if (compression_suffixes[i].length == len && memcmp (compression_suffixes[i].suffix, suf, len) == 0) - return compression_suffixes[i].program; + return compression_suffixes[i].type; } } - return defprog; + return compression_none; } void -set_comression_program_by_suffix (const char *name, const char *defprog) +set_comression_program_by_suffix (const char *name, const char *defname) { - const char *program = find_compression_program (name, defprog); + enum compression_type type = archive_name_to_compression_type (name); + const char *program = + (type == compression_none) ? defname + : compression_to_program_name (type); if (program) use_compress_program_option = program; } diff --git a/src/tar.c b/src/tar.c index a639974..8fb1509 100644 --- a/src/tar.c +++ b/src/tar.c @@ -326,6 +326,7 @@ enum SHOW_OMITTED_DIRS_OPTION, SHOW_TRANSFORMED_NAMES_OPTION, SPARSE_VERSION_OPTION, + STANDARD_COMPRESS_PROGRAM_OPTION, STRIP_COMPONENTS_OPTION, SUFFIX_OPTION, TEST_LABEL_OPTION, @@ -629,6 +630,10 @@ static struct argp_option options[] = { N_("filter the archive through lzop"), GRID+8 }, {"xz", 'J', 0, 0, N_("filter the archive through xz"), GRID+8 }, + {"standard-compress-program", STANDARD_COMPRESS_PROGRAM_OPTION, + N_("COMPR:PROGRAM"), 0, + N_("use PROGRAM when --COMPR option is given; COMPR is one of " + "compress, gzip, bzip2, lzma, lzop, or xz"), GRID+8 }, {"use-compress-program", 'I', N_("PROG"), 0, N_("filter through PROG (must accept -d)"), GRID+1 }, #undef GRID @@ -1588,6 +1593,10 @@ parse_opt (int key, char *arg, struct argp_state *state) } } break; + + case STANDARD_COMPRESS_PROGRAM_OPTION: + set_compression_program_name (arg); + break; case 't': set_subcommand_option (LIST_SUBCOMMAND); |
| Free embeddable forum powered by Nabble | Forum Help |