WARNING: This server is unstable and will be retired in the next days. If you want to keep this forum available, please request immediately a migration on the Nabble Support forum. Forums that don't receive any migration request will be deleted forever.

Re: sed strips CRs

View: New views
12 Messages — Rating Filter:   Alert me  

Parent Message unknown Re: sed strips CRs

by eblake :: Rate this Message:

| View Threaded | Show Only this Message

[adding bug-sed - see this thread in cygwin:
http://cygwin.com/ml/cygwin/2012-02/msg00313.html]

On 02/11/2012 10:19 AM, Earnie Boyd wrote:

>>> By this I assume you to mean that the -b option opens the input file
>>> in binary mode.  But the mount table the OP showed was already in
>>> binary mode.  Does sed not take that into consideration, I.E. it
>>> specifies the mode as a text file unless -b is specified, is this
>>> correct?
>>
>> Yes.  By default files are fopened using the "rt" mode on systems
>> supporting this mode.  This behaviour is hardcoded into upstream sed.
>
> But on Linux I would expect the "t" to be ignored and the file is open
> in "binary" mode anyway.
Personally, I think it is a bug that upstream sed is using 't' in
fopen() in the first place.  Linux does NOT have an 'rt' mode for a
reason: 't' is non-standard.  On cygwin, the preference used in
coreutils is that you get text mode by using 'r' and binary mode by
using 'rb', on the mount points where text mode matters; you should
almost never use 'rt' which forces text mode even on binary mounts.
That is, sed should be just fine using 'r' instead of 'rt', and it would
fix the perceived broken behavior on cygwin binary mounts.

But fixing this should be done upstream, and not in cygwin.

--
Eric Blake   eblake@...    +1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc (633 bytes) Download Attachment

Re: sed strips CRs

by Earnie :: Rate this Message:

| View Threaded | Show Only this Message

On Mon, Feb 13, 2012 at 9:12 AM, Eric Blake wrote:

>
> Personally, I think it is a bug that upstream sed is using 't' in
> fopen() in the first place.  Linux does NOT have an 'rt' mode for a
> reason: 't' is non-standard.  On cygwin, the preference used in
> coreutils is that you get text mode by using 'r' and binary mode by
> using 'rb', on the mount points where text mode matters; you should
> almost never use 'rt' which forces text mode even on binary mounts.
> That is, sed should be just fine using 'r' instead of 'rt', and it would
> fix the perceived broken behavior on cygwin binary mounts.
>
> But fixing this should be done upstream, and not in cygwin.

I've stayed away from voicing personal feelings.  While modifying
upstream certainly would resolve the issue of CRLF being read in
"text" mode; I, on the other hand, believe that Cygwin should open the
file descriptor in binary mode regardless.  Note, though, the
difference between normal processing mode in sed and versus sed -b is
one of line mode versus buffered mode because you can't treat a binary
data file as text lines.  Modifying upstream would destroy those
systems that require 'rt' to operate in text mode and I'm not meaning
Windows; I don't know if any do.

--
Earnie
-- https://sites.google.com/site/earnieboyd


Re: sed strips CRs

by Paolo Bonzini-2 :: Rate this Message:

| View Threaded | Show Only this Message

On 02/13/2012 03:12 PM, Eric Blake wrote:
> But fixing this should be done upstream, and not in cygwin.

As long as it's consistent with coreutils I'll certainly do the change.

Paolo


Re: sed strips CRs

by Corinna Vinschen-2 :: Rate this Message:

| View Threaded | Show Only this Message

[Sent again.  I missed all the CC's in my previous reply.  Sorry!]

On Feb 13 15:37, Paolo Bonzini wrote:
> On 02/13/2012 03:12 PM, Eric Blake wrote:
> >But fixing this should be done upstream, and not in cygwin.
>
> As long as it's consistent with coreutils I'll certainly do the change.
>
> Paolo

Thanks!  Would you mind to CC the cygwin list when the next upstream
sed release is available?


Corinna

--
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat


Re: sed strips CRs

by Paolo Bonzini-2 :: Rate this Message:

| View Threaded | Show Only this Message

On 02/13/2012 03:56 PM, Corinna Vinschen wrote:
> > As long as it's consistent with coreutils I'll certainly do the change.
>
> Thanks!  Would you mind to CC the cygwin list when the next upstream
> sed release is available?

Sure, it should be real soon now since a new release has been long overdue.

By the way, I'm still opening the script file with "rt".  I cannot think
of any case when you would want to keep CRs there.

Paolo


Re: sed strips CRs

by Corinna Vinschen-2 :: Rate this Message:

| View Threaded | Show Only this Message

On Feb 13 16:22, Paolo Bonzini wrote:

> On 02/13/2012 03:56 PM, Corinna Vinschen wrote:
> >> As long as it's consistent with coreutils I'll certainly do the change.
> >
> >Thanks!  Would you mind to CC the cygwin list when the next upstream
> >sed release is available?
>
> Sure, it should be real soon now since a new release has been long overdue.
>
> By the way, I'm still opening the script file with "rt".  I cannot
> think of any case when you would want to keep CRs there.

Indeed, that sounds like the right thing to do.


Thank you,
Corinna

--
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat


Re: sed strips CRs

by Earnie :: Rate this Message:

| View Threaded | Show Only this Message

On Mon, Feb 13, 2012 at 10:22 AM, Paolo Bonzini <bonzini@...> wrote:

> On 02/13/2012 03:56 PM, Corinna Vinschen wrote:
>>
>> > As long as it's consistent with coreutils I'll certainly do the change.
>>
>> Thanks!  Would you mind to CC the cygwin list when the next upstream
>> sed release is available?
>
>
> Sure, it should be real soon now since a new release has been long overdue.
>
> By the way, I'm still opening the script file with "rt".  I cannot think of
> any case when you would want to keep CRs there.

The case of

sed -e 's/something/nothing/g' myfile > myfile2

as it works in Cygwin today would mean that in the case of the OP's
drive settings myfile2 would not contain the CR.  Treating CR as white
space is the more proper thing to do, IMO.

--
Earnie
-- https://sites.google.com/site/earnieboyd


Re: sed strips CRs

by Paolo Bonzini-2 :: Rate this Message:

| View Threaded | Show Only this Message

On 02/13/2012 04:43 PM, Earnie Boyd wrote:

>> >
>> > By the way, I'm still opening the script file with "rt".  I cannot think of
>> > any case when you would want to keep CRs there.
> The case of
>
> sed -e 's/something/nothing/g' myfile > myfile2
>
> as it works in Cygwin today would mean that in the case of the OP's
> drive settings myfile2 would not contain the CR.  Treating CR as white
> space is the more proper thing to do, IMO.

myfile is not the script file.  The script file is the one that you pass
to -f.

Using "rt" was introduced in both cases for Cygwin, so regressions on
other systems shouldn't be a problem.

Paolo


Re: sed strips CRs

by John Cowan-3 :: Rate this Message:

| View Threaded | Show Only this Message

Paolo Bonzini scripsit:

> By the way, I'm still opening the script file with "rt".  I cannot think  
> of any case when you would want to keep CRs there.

You wouldn't, but the point is that "rt" isn't defined on Posix systems.
If it happens to be the same as "r", good, but that isn't guaranteed.
And the only time "rt" does anything different from "r" on a Win32 system
is when you have:

1) linked your executable with the system-supplied 'binmode.obj' file

2) set the global variable _fmode to O_BINARY

3) invoked _set_fmode(O_BINARY)

all of which make "r" synonymous with "rb".  Programs which don't do any
of these should use "r" rather than "rt", as it is guaranteed to do the
right thing for text on both Win32 and Posix systems.

--
You annoy me, Rattray!  You disgust me!         John Cowan
You irritate me unspeakably!  Thank Heaven,     cowan@...
I am a man of equable temper, or I should       http://www.ccil.org/~cowan
scarcely be able to contain myself before
your mocking visage.            --Stalky imitating Macrea


Re: sed strips CRs

by Paolo Bonzini-2 :: Rate this Message:

| View Threaded | Show Only this Message

On 02/13/2012 08:42 PM, John Cowan wrote:
>> > By the way, I'm still opening the script file with "rt".  I cannot think
>> > of any case when you would want to keep CRs there.
> You wouldn't, but the point is that "rt" isn't defined on Posix systems.
> If it happens to be the same as "r", good, but that isn't guaranteed.

Yes, I added a configure-time check too.  I assume that if "rt" works,
it can be used instead of "r".

> And the only time "rt" does anything different from "r" on a Win32 system
> is when you have:
>
> 1) linked your executable with the system-supplied 'binmode.obj' file
>
> 2) set the global variable _fmode to O_BINARY
>
> 3) invoked _set_fmode(O_BINARY)
>
> all of which make "r" synonymous with "rb".  Programs which don't do any
> of these should use "r" rather than "rt", as it is guaranteed to do the
> right thing for text on both Win32 and Posix systems.

No, "rt" also does something different than "r" on Cygwin with
binary-mounts.

If you meant that "rt" should be restricted to cygwin, that's also fine
by me but in general I prefer feature tests to OS tests.

Paolo


Re: sed strips CRs

by Earnie :: Rate this Message:

| View Threaded | Show Only this Message

On Mon, Feb 13, 2012 at 2:48 PM, Paolo Bonzini wrote:
>
> If you meant that "rt" should be restricted to cygwin, that's also fine by
> me but in general I prefer feature tests to OS tests.
>

Then it becomes Cygwin's problem.  I'm going to quote from
http://msdn.microsoft.com/en-us/library/yeby3zcb.aspx

<quote>
t
Open in text (translated) mode. In this mode, CTRL+Z is interpreted as
an EOF character on input. In files that are opened for
reading/writing by using "a+", fopen checks for a CTRL+Z at the end of
the file and removes it, if possible. This is done because using fseek
and ftell to move within a file that ends with CTRL+Z may cause fseek
to behave incorrectly near the end of the file.

In text mode, carriage return–linefeed combinations are translated
into single linefeeds on input, and linefeed characters are translated
to carriage return–linefeed combinations on output. When a Unicode
stream-I/O function operates in text mode (the default), the source or
destination stream is assumed to be a sequence of multibyte
characters. Therefore, the Unicode stream-input functions convert
multibyte characters to wide characters (as if by a call to the mbtowc
function). For the same reason, the Unicode stream-output functions
convert wide characters to multibyte characters (as if by a call to
the wctomb function).
</quote>

So does Cygwin really want to specify "rt"?  I would rather sed
specify "rb" and treat the CR as white space.  I know that treating CR
as white space works well.

--
Earnie
-- https://sites.google.com/site/earnieboyd


Parent Message unknown Re: sed strips CRs

by Corinna Vinschen-2 :: Rate this Message:

| View Threaded | Show Only this Message

On Feb 15 01:13, Andrey Repin wrote:

> Greetings, Earnie Boyd!
>
> >>> The standard response to issues dealing with CRLF files is to point the
> >>> user to dos2unix and text mode mounts. This should be adequate without
> >>> the hidden behavior of sed/grep/awk and probably others.
> >>
> >> While your reasoning is sound, I prefer them to behave the way they are for my
> >> own goals.
> >> I can't possible alter between text and binary mounts, or use d2u on and off
> >> in an attempt to produce consistent and predictable end results.
>
> > Consistent behavior is what I see as the issue.  IMO, sed, grep and
> > awk should behave the same as on UNIX for consistent and predictable
> > behavior.  Treating CR as white space not only helps the Windows user
> > it also helps the Unix user when Windows files are transferred to it.
> > I know that for sed at least treating CR as white space works very
> > well.
>
> Most apparent: it breaking EOL matches.

This is about the script, not the input file.  From my POV both works
fine here, either just ignoring the CR, or using "rt" mode.


Corinna

--
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat