|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
join with header line supportHello,
I'd like to suggest a small feature for 'join': "--header" makes join join the first line from each file regardless of the join field and ordering. This allows joining files which have header lines in them. Example: =============== $ cat 1.txt ID Color Name 1 green Alice 2 red Bob 3 blue Carol 4 black Dave $ cat 2.txt ID Age 2 55 4 24 $ join --check-order --header -j 1 -a 1 -e unknown -o "0 1.3 2.2" 1.txt 2.txt ID Name Age 1 Alice unknown 2 Bob 55 3 Carol unknown 4 Dave 24 =============== Although the above can be accomplished by using several other utilities (cut, head, paste, sed or similar combination), having this feature built-in in join makes life a lot easier - especially if I'm joining severals files ( using pipes ), or using specific output fields (with "-o") - join will thus take care of extracting the right field header into the header line. The following patch adds the "--header" feature. If "--header" is not used - there are no changes to the regular program flow. Comments are welcomed. This patch is released under GPLv3 or later. If you're willing to accept this patch, I'll be happy to assign copyright to GNU, etc. thanks, gordon ============================= --- join.orig.c 2009-09-23 04:25:44.000000000 -0400 +++ join.c 2009-10-30 19:00:01.000000000 -0400 @@ -146,6 +146,7 @@ static struct option const longopts[] = {"ignore-case", no_argument, NULL, 'i'}, {"check-order", no_argument, NULL, CHECK_ORDER_OPTION}, {"nocheck-order", no_argument, NULL, NOCHECK_ORDER_OPTION}, + {"header", no_argument, NULL, 'H'}, {GETOPT_HELP_OPTION_DECL}, {GETOPT_VERSION_OPTION_DECL}, {NULL, 0, NULL, 0} @@ -157,6 +158,10 @@ static struct line uni_blank; /* If nonzero, ignore case when comparing join fields. */ static bool ignore_case; +/* If nonzero, treat the first line of each file as column headers - + join them without checking for ordering */ +static bool join_header_lines; + void usage (int status) { @@ -191,6 +196,7 @@ by whitespace. When FILE1 or FILE2 (not --check-order check that the input is correctly sorted, even\n\ if all input lines are pairable\n\ --nocheck-order do not check that the input is correctly sorted\n\ + --header treat first line in each file as field header line.\n\ "), stdout); fputs (HELP_OPTION_DESCRIPTION, stdout); fputs (VERSION_OPTION_DESCRIPTION, stdout); @@ -616,6 +622,15 @@ join (FILE *fp1, FILE *fp2) initseq (&seq2); getseq (fp2, &seq2, 2); + if (join_header_lines && seq1.count && seq2.count) + { + prjoin(seq1.lines[0], seq2.lines[0]); + prevline[0] = NULL ; + prevline[1] = NULL ; + advance_seq (fp1, &seq1, true, 1); + advance_seq (fp2, &seq2, true, 2); + } + while (seq1.count && seq2.count) { size_t i; @@ -1052,6 +1067,10 @@ main (int argc, char **argv) &nfiles, &prev_optc_status, &optc_status); break; + case 'H': + join_header_lines = true ; + break; + case_GETOPT_HELP_CHAR; case_GETOPT_VERSION_CHAR (PROGRAM_NAME, AUTHORS); |
|
|
Re: join with header line support-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 According to Assaf Gordon on 10/30/2009 5:02 PM: > Although the above can be accomplished by using several other utilities > (cut, head, paste, sed or similar combination), having this feature > built-in in join makes life a lot easier - especially if I'm joining > severals files ( using pipes ), or using specific output fields (with > "-o") - join will thus take care of extracting the right field header > into the header line. First off, thanks for taking the time to contribute. Whether or not this goes anywhere, and whether or not my email seems like a harsh critique, you should know that one of the joys of free software is that you were able to scratch your own itch, and that you can use it whether or not it gets folded in upstream. That said... The bar is very high for adding new options, especially for burning a short option on something that doesn't have much background. That doesn't necessarily mean we are outright refusing your patch, but since you admitted that this can already be done with standardized tools, it may be a better use of our time to add an example in the documentation of how to achieve the same effect (or in the process of writing such documentation, show us how hairy that construct turned out to be and why it is worth inlining). That way, people can use the hairy construct now, even if they don't have GNU coreutils, rather than waiting several years for your new convenience feature to propagate to enough machines to be worth assuming that it might be present without having to manually upgrade coreutils first. > Comments are welcomed. This patch is released under GPLv3 or later. > If you're willing to accept this patch, I'll be happy to assign > copyright to GNU, etc. You'll need documentation, an addition to the testsuite, mention in the NEWS file, and so forth, before this patch could be worthy of inclusion (and that is ignoring the technical issue of whether we want this feature; for which I am abstaining from giving my opinion at the moment). All told, it will amount to a non-trivial patch, so yes, you would need to start the paperwork process of assigning copyright to the FSF; let us know if you want to further pursue this route. The HACKING file in a git checkout has more details on writing a bulletproof patch. > @@ -191,6 +196,7 @@ by whitespace. When FILE1 or FILE2 (not > --check-order check that the input is correctly sorted, even\n\ > if all input lines are pairable\n\ > --nocheck-order do not check that the input is correctly sorted\n\ > + --header treat first line in each file as field header line.\n\ The alignment looks weird here. > + if (join_header_lines && seq1.count && seq2.count) + { This won't compile. And even if it did, it doesn't match neighboring style. It's hard to review something that isn't even complete. > + prjoin(seq1.lines[0], seq2.lines[0]); > + prevline[0] = NULL ; No space before ';'; multiple instances in your patch. - -- Don't work too hard, make some time for fun as well! Eric Blake ebb9@... -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkrrdXAACgkQ84KuGfSFAYCxkQCfat2DcNxifFBsXJu4MnT5rtO5 r0sAoKUUT/65QKv0YsFi4uVjPDdaI41c =9712 -----END PGP SIGNATURE----- |
|
|
Re: join with header line supportAssaf Gordon wrote:
> Hello, > > I'd like to suggest a small feature for 'join': > > "--header" makes join join the first line from each file regardless of > the join field and ordering. > This allows joining files which have header lines in them. > > Example: > =============== > $ cat 1.txt > ID Color Name > 1 green Alice > 2 red Bob > 3 blue Carol > 4 black Dave > > > $ cat 2.txt > ID Age > 2 55 > 4 24 I like that. cheers, Pádraig. |
|
|
|
| Free embeddable forum powered by Nabble | Forum Help |