AWK, deleting and comparing lines

View: New views
2 Messages — Rating Filter:   Alert me  

AWK, deleting and comparing lines

by David Scherrer-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear all,

I would like to use AWK to delete lines of a file that have a "C" in their
first field as well as the closest foregoing line containing a "T" in the
first field. When there are 2,3,4,... "C"-lines after each other I would
also like to delete the closest 2,3,4,... foregoing "T"-lines.

Additionally it would be great if I could compare the "C" and the "T" line
and condition deletion upon on the comparison result. E.g. delete the
closest forgoing "T"-line above with fields 4 and 6 the same as the
"C"-line.

For illustrating purposes I copy pasted a short sequence of lines and marked
the C lines red and the corresponding T lines yellow.

I'm very grateful for any help!

Many thanks,
David


T,N,N,2000000,A,98.025,3.136519,20020702,14:19:20,,98.025,3.136519
T,N,N,2500000,A,98.1503,,20020702,14:20:36,,98.1503,
C,N,N,2500000,A,98.1503,,20020702,14:20:36,,98.025,3.136519
T,N,N,2500000,A,98.1503,,20020702,15:14:27,,98.1503,
T,N,N,1500000,A,98.8,3.083,20020703,9:45:10,3.083,98.8,3.083
T,N,N,1000000,A,100.1067,2.953,20020705,9:31:39,2.512,101.13,2.512
T,N,N,500000,A,101.13,2.512,20020705,10:47:51,2.512,101.13,2.512
T,N,N,3900000,A,85,4.135633,20020708,10:31:43,4.114726,85.25,4.114726
T,N,N,3900000,A,85,4.135633,20020708,10:31:43,4.114726,85.25,4.114726
C,N,N,3900000,A,85,4.135633,20020708,10:31:43,4.114726,85.25,4.114726
C,N,N,3900000,A,85,4.135633,20020708,10:31:43,4.114726,85.25,4.114726
T,N,N,1500000,A,100.875,2.621247,20020708,16:11:18,2.621247,100.875,2.621247
T,N,N,710000,A,100.3156,,20020709,10:34:23,,100.3156,
T,N,N,285000,A,100.875,2.620812,20020709,11:15:58,2.620812,100.875,2.620812
T,N,N,60000,A,99.75,3.01699,20020710,10:45:48,3.01699,99.75,3.01699

Re: AWK, deleting and comparing lines

by John Cowan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

David Scherrer scripsit:

> I would like to use AWK to delete lines of a file that have a "C" in
> their first field as well as the closest foregoing line containing a "T"
> in the first field. When there are 2,3,4,... "C"-lines after each other
> I would also like to delete the closest 2,3,4,... foregoing "T"-lines.
>
> Additionally it would be great if I could compare the "C" and the "T"
> line and condition deletion upon on the comparison result. E.g. delete
> the closest forgoing "T"-line above with fields 4 and 6 the same as the
> "C"-line.

This involves deleting lines you have already processed.  Your only
hope is to slurp the whole file into an array of lines indexed by line
number and do all the work in the END block, printing out the array
as the last thing.  This may make your awk implementation blow up,
depending on what implementation it is and what computer you are using.

Perl makes this sort of thing rather easier.

--
John Cowan           http://www.ccil.org/~cowan            cowan@...
One of the oil men in heaven started a rumor of a gusher down in hell.  All
the other oil men left in a hurry for hell.  As he gets to thinking about
the rumor he had started he says to himself there might be something in
it after all.  So he leaves for hell in a hurry.    --Carl Sandburg