CR and LF in chunk extension values

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 | Next >

CR and LF in chunk extension values

by Bjoern Hoehrmann :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

  A chunk extension value is defined as either token or quoted-string. A
quoted-string allows CRs and LFs for folding and in escaped form under
RFC 2616; we have since outlawed the escaped form, and in headers, but
not chunk extension values, we now outlaw producing them for folding as-
well. Accepting and processing the latter correctly still appears to be
a SHOULD level requirement; I am not sure about the former.

It appears that implementations usually just read a line and ignore any-
thing after the first ";" character at the beginning of a chunk. Perhaps
the specification should use a CRLF-free quoted-string instead for this;
if not, the considerations for obs-fold should apply to chunk extension
values aswell, or obs-fold should not be used for chunk extension values
(which would require a separate quoted-string production aswell).

regards,
--
Björn Höhrmann · mailto:bjoern@... · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 


Re: CR and LF in chunk extension values

by mnot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Now #173:
   http://trac.tools.ietf.org/wg/httpbis/trac/ticket/173

We probably need to have a more general discussion of chunk-extensions  
as well...


On 18/06/2009, at 4:07 AM, Bjoern Hoehrmann wrote:

> Hi,
>
>  A chunk extension value is defined as either token or quoted-
> string. A
> quoted-string allows CRs and LFs for folding and in escaped form under
> RFC 2616; we have since outlawed the escaped form, and in headers, but
> not chunk extension values, we now outlaw producing them for folding  
> as-
> well. Accepting and processing the latter correctly still appears to  
> be
> a SHOULD level requirement; I am not sure about the former.
>
> It appears that implementations usually just read a line and ignore  
> any-
> thing after the first ";" character at the beginning of a chunk.  
> Perhaps
> the specification should use a CRLF-free quoted-string instead for  
> this;
> if not, the considerations for obs-fold should apply to chunk  
> extension
> values aswell, or obs-fold should not be used for chunk extension  
> values
> (which would require a separate quoted-string production aswell).
>
> regards,
> --
> Björn Höhrmann · mailto:bjoern@... · http://bjoern.hoehrmann.de
> Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
> 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://
> www.websitedev.de/
>


--
Mark Nottingham     http://www.mnot.net/



Re: CR and LF in chunk extension values

by Jamie Lokier :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Bjoern Hoehrmann wrote:
> Hi,
>
>   A chunk extension value is defined as either token or quoted-string. A
> quoted-string allows CRs and LFs for folding and in escaped form under
> RFC 2616; we have since outlawed the escaped form, and in headers, but
> not chunk extension values, we now outlaw producing them for folding as-
> well. Accepting and processing the latter correctly still appears to be
> a SHOULD level requirement; I am not sure about the former.

Hmm.  I had no idea line folding was allowed inside a quoted-string,
and I expect I'm not the only one.  That's quite a surprise.

-- Jamie


#173: CR and LF in chunk extension values

by mnot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

This was discussed in Stockholm, and there was agreement in the room  
that the proper way to address this is to disallow CR and LF in *any*  
quoted-string.

Comments?


On 25/06/2009, at 3:53 PM, Mark Nottingham wrote:

> Now #173:
>  http://trac.tools.ietf.org/wg/httpbis/trac/ticket/173
>
> We probably need to have a more general discussion of chunk-
> extensions as well...
>
>
> On 18/06/2009, at 4:07 AM, Bjoern Hoehrmann wrote:
>
>> Hi,
>>
>> A chunk extension value is defined as either token or quoted-
>> string. A
>> quoted-string allows CRs and LFs for folding and in escaped form  
>> under
>> RFC 2616; we have since outlawed the escaped form, and in headers,  
>> but
>> not chunk extension values, we now outlaw producing them for  
>> folding as-
>> well. Accepting and processing the latter correctly still appears  
>> to be
>> a SHOULD level requirement; I am not sure about the former.
>>
>> It appears that implementations usually just read a line and ignore  
>> any-
>> thing after the first ";" character at the beginning of a chunk.  
>> Perhaps
>> the specification should use a CRLF-free quoted-string instead for  
>> this;
>> if not, the considerations for obs-fold should apply to chunk  
>> extension
>> values aswell, or obs-fold should not be used for chunk extension  
>> values
>> (which would require a separate quoted-string production aswell).
>>
>> regards,
>> --
>> Björn Höhrmann · mailto:bjoern@... · http://bjoern.hoehrmann.de
>> Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
>> 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
>>
>
>
> --
> Mark Nottingham     http://www.mnot.net/
>
>


--
Mark Nottingham     http://www.mnot.net/



Re: #173: CR and LF in chunk extension values

by Henrik Nordstrom-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

tis 2009-08-11 klockan 05:31 +1000 skrev Mark Nottingham:
> This was discussed in Stockholm, and there was agreement in the room  
> that the proper way to address this is to disallow CR and LF in *any*  
> quoted-string.
>
> Comments?

Escaped newlines or \0 characters in the form of quoted-pair very likely
to cause many parsers to fail no matter where these are seen. I know I
have always understood this as a mechanism intended for quoting special
characters like " ( and ),  and not including CTLs.

Regarding chunked encoding allowing any newlines there is a very very
bad idea. Folding is not supported there, and no one expects to see
newlines in the middle of a chunk header quoted or not.

I would propose changing quoted-pair to restrict the allowable set to
non-CTLs to match most expectations on what values may be seen, not only
excluding CR or LF.

    quoted-pair  = "\" <any CHAR except CTLs>

instead of

    quoted-pair  = "\" CHAR

Regards
Henrik



Re: #173: CR and LF in chunk extension values

by mnot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Right now, it's defined as:

> A string of text is parsed as a single word if it is quoted using
> double-quote marks.
>
>   quoted-string  = DQUOTE *( qdtext / quoted-pair ) DQUOTE
>   qdtext         = OWS / %x21 / %x23-5B / %x5D-7E / obs-text
>                  ; OWS / <VCHAR except DQUOTE and "\"> / obs-text
>   obs-text       = %x80-FF
>
> The backslash character ("\") MAY be used as a single-character
> quoting mechanism only within quoted-string and comment constructs.
>
>   quoted-text    = %x01-09 /
>                    %x0B-0C /
>                    %x0E-FF ; Characters excluding NUL, CR and LF
>   quoted-pair    = "\" quoted-text

So it seems like we need to:

1) Consider removing OWS from qdtext, replacing it with space and tab  
only. While we could use BWS here, receivers are required to accept  
it, which I don't think is the desired effect. And,

2) Consider removing obs-text from qdtext, as it's a hole that a truck  
can drive through. Otherwise, modify it to explicitly disallow CTLs.  
And,

3) Restrict the allowable set of characters in quoted-text to disallow  
CTLs. VCHAR?



On 11/08/2009, at 8:50 AM, Henrik Nordstrom wrote:

> tis 2009-08-11 klockan 05:31 +1000 skrev Mark Nottingham:
>> This was discussed in Stockholm, and there was agreement in the room
>> that the proper way to address this is to disallow CR and LF in *any*
>> quoted-string.
>>
>> Comments?
>
> Escaped newlines or \0 characters in the form of quoted-pair very  
> likely
> to cause many parsers to fail no matter where these are seen. I know I
> have always understood this as a mechanism intended for quoting  
> special
> characters like " ( and ),  and not including CTLs.
>
> Regarding chunked encoding allowing any newlines there is a very very
> bad idea. Folding is not supported there, and no one expects to see
> newlines in the middle of a chunk header quoted or not.
>
> I would propose changing quoted-pair to restrict the allowable set to
> non-CTLs to match most expectations on what values may be seen, not  
> only
> excluding CR or LF.
>
>    quoted-pair  = "\" <any CHAR except CTLs>
>
> instead of
>
>    quoted-pair  = "\" CHAR
>
> Regards
> Henrik
>


--
Mark Nottingham     http://www.mnot.net/



Re: #173: CR and LF in chunk extension values

by mnot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

That leaves us at:

1) Replace OWS in qdtext with space and tab, and
2) Remove obs-text from qdtext, and
3) Restrict quoted-text to VCHAR.

Milestone assigned for -08; barring any other discussion, we'll see  
what the editors come up with in that revision.


On 12/08/2009, at 4:43 PM, Mark Nottingham wrote:

> Right now, it's defined as:
>
>> A string of text is parsed as a single word if it is quoted using
>> double-quote marks.
>>
>>  quoted-string  = DQUOTE *( qdtext / quoted-pair ) DQUOTE
>>  qdtext         = OWS / %x21 / %x23-5B / %x5D-7E / obs-text
>>                 ; OWS / <VCHAR except DQUOTE and "\"> / obs-text
>>  obs-text       = %x80-FF
>>
>> The backslash character ("\") MAY be used as a single-character
>> quoting mechanism only within quoted-string and comment constructs.
>>
>>  quoted-text    = %x01-09 /
>>                   %x0B-0C /
>>                   %x0E-FF ; Characters excluding NUL, CR and LF
>>  quoted-pair    = "\" quoted-text
>
> So it seems like we need to:
>
> 1) Consider removing OWS from qdtext, replacing it with space and  
> tab only. While we could use BWS here, receivers are required to  
> accept it, which I don't think is the desired effect. And,
>
> 2) Consider removing obs-text from qdtext, as it's a hole that a  
> truck can drive through. Otherwise, modify it to explicitly disallow  
> CTLs. And,
>
> 3) Restrict the allowable set of characters in quoted-text to  
> disallow CTLs. VCHAR?
>
>
>
> On 11/08/2009, at 8:50 AM, Henrik Nordstrom wrote:
>
>> tis 2009-08-11 klockan 05:31 +1000 skrev Mark Nottingham:
>>> This was discussed in Stockholm, and there was agreement in the room
>>> that the proper way to address this is to disallow CR and LF in  
>>> *any*
>>> quoted-string.
>>>
>>> Comments?
>>
>> Escaped newlines or \0 characters in the form of quoted-pair very  
>> likely
>> to cause many parsers to fail no matter where these are seen. I  
>> know I
>> have always understood this as a mechanism intended for quoting  
>> special
>> characters like " ( and ),  and not including CTLs.
>>
>> Regarding chunked encoding allowing any newlines there is a very very
>> bad idea. Folding is not supported there, and no one expects to see
>> newlines in the middle of a chunk header quoted or not.
>>
>> I would propose changing quoted-pair to restrict the allowable set to
>> non-CTLs to match most expectations on what values may be seen, not  
>> only
>> excluding CR or LF.
>>
>>   quoted-pair  = "\" <any CHAR except CTLs>
>>
>> instead of
>>
>>   quoted-pair  = "\" CHAR
>>
>> Regards
>> Henrik
>>
>
>
> --
> Mark Nottingham     http://www.mnot.net/
>
>


--
Mark Nottingham     http://www.mnot.net/



Re: #173: CR and LF in chunk extension values

by Julian Reschke :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Mark Nottingham wrote:
> That leaves us at:
>
> 1) Replace OWS in qdtext with space and tab, and
> 2) Remove obs-text from qdtext, and
> 3) Restrict quoted-text to VCHAR.
>
> Milestone assigned for -08; barring any other discussion, we'll see what
> the editors come up with in that revision.
> ...

1)

   qdtext         = WSP / %x21 / %x23-5B / %x5D-7E / obs-text
                  ; WSP / <VCHAR except DQUOTE and "\"> / obs-text
   obs-text       = %x80-FF

2)

What's the problem with obs-text? It doesn't contain controls...

3)

It seems to me that the purpose of quoted-text is to allow any character
in qdtext, plus DQUOTE and "\", which would make it

quoted-text = qdtext / DQUOTE / "\"

While we're at it, we probably should rename it to quoted-char, and also
add a short statement what the semantics of a quoted-pair is.

BR, Julian


Re: #173: CR and LF in chunk extension values

by mnot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 24/08/2009, at 11:18 PM, Julian Reschke wrote:

> Mark Nottingham wrote:
>> That leaves us at:
>> 1) Replace OWS in qdtext with space and tab, and
>> 2) Remove obs-text from qdtext, and
>> 3) Restrict quoted-text to VCHAR.
>> Milestone assigned for -08; barring any other discussion, we'll see  
>> what the editors come up with in that revision.
>> ...
>
> 1)
>
>  qdtext         = WSP / %x21 / %x23-5B / %x5D-7E / obs-text
>                 ; WSP / <VCHAR except DQUOTE and "\"> / obs-text
>  obs-text       = %x80-FF

Looks good.


> 2)
>
> What's the problem with obs-text? It doesn't contain controls...

Mea culpa; misread that. Never mind #2.


> 3)
>
> It seems to me that the purpose of quoted-text is to allow any  
> character in qdtext, plus DQUOTE and "\", which would make it
>
> quoted-text = qdtext / DQUOTE / "\"
>
> While we're at it, we probably should rename it to quoted-char, and  
> also add a short statement what the semantics of a quoted-pair is.


I had to read that a few times, but I think I agree. However, "quoted-
char" may be confusing, as it's very similar to "quoted-pair".





--
Mark Nottingham     http://www.mnot.net/



Re: #173: CR and LF in chunk extension values

by Julian Reschke :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Mark Nottingham wrote:

> ...
>> 3)
>>
>> It seems to me that the purpose of quoted-text is to allow any
>> character in qdtext, plus DQUOTE and "\", which would make it
>>
>> quoted-text = qdtext / DQUOTE / "\"
>>
>> While we're at it, we probably should rename it to quoted-char, and
>> also add a short statement what the semantics of a quoted-pair is.
>
>
> I had to read that a few times, but I think I agree. However,
> "quoted-char" may be confusing, as it's very similar to "quoted-pair".

And yes,

        qdtext / DQUOTE / "\"

is the same as

        WSP / VCHAR / obs-text

...but I think the former is more clear in that it adds DQUOTE and "\".

But.

quoted-pair is also used in comments. Are we ok with restricting the set
here as well? And, if yes, shouldn't we then also adjust the allowed set
for non-quoted characters in comments?

Currently it reads
(<http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p1-messaging-latest.html#rfc.section.3.2>):

   comment        = "(" *( ctext / quoted-pair / comment ) ")"
   ctext          = OWS / %x21-27 / %x2A-5B / %x5D-7E / obs-text
                  ; OWS / <VCHAR except "(", ")", and "\"> / obs-text

To make it consistent with quoted-string it would need to change to:

   ctext          = BWS / %x21-27 / %x2A-5B / %x5D-7E / obs-text
                  ; BWS / <VCHAR except "(", ")", and "\"> / obs-text

Feedback appreciated,

Julian


Re: #173: CR and LF in chunk extension values

by Henrik Nordstrom-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Should probably change topic here, but it's still relevant so keeping
the issue topic. Most of this is taking a more generic view of
quoted-pair, not isolated to chunk extension values.

tis 2009-08-25 klockan 09:11 +0200 skrev Julian Reschke:

> quoted-pair is also used in comments. Are we ok with restricting the set
> here as well? And, if yes, shouldn't we then also adjust the allowed set
> for non-quoted characters in comments?

What? Restricting how? I thought we were talking about restricting the
use of CTLs?


Now some further rambling on the use of quoted-pair and the difficulties
this causes for parsers:


qdtext is for text within a quoted-string, and MUST NOT include '"' or
'\'. Those two must be produced as quoted-pair to be used within a
quoted-string.

    qdtext         = OWS / %x21 / %x23-5B / %x5D-7E / obs-text
                   ; OWS / <VCHAR except DQUOTE and "\"> / obs-text

ctext is the same but for comment, and MUST NOT include '(', ')' or '\'.
Those three must be produced as quoted-pair to be used within a comment.

    ctext          = OWS / %x21-27 / %x2A-5B / %x5D-7E / obs-text
                   ; OWS / <VCHAR except "(", ")", and "\"> / obs-text

Neither of qdtext or ctext allows for CTLs, except for HT or obsoleted
CRLF folding (from OWS).

Specifications (2616) is very strict on where quoted-pair is alowed to
be used, but it's at the same time very subtle where those areas are
creating a large grey area where parsing is somewhat non-obvious.

It's the same question as been raised earlier regarding comments. A
construct looking like a comment is only a comment if the header in
question is defined to allow comments, if not it's literally part of the
header value.

Quoted-string is also only quoted-string if the header in question is
defined to accept quoted-string, if not it may be a literal part of the
header value even if it may look like a quoted-string (for a header
defined as taking *TEXT as value, 2616 has no such headers however)

RFC2616 BNF and relevant comments:

      generic-message = start-line
                        *(message-header CRLF)
                        CRLF
                        [ message-body ]
       message-header = field-name ":" [ field-value ]
       field-name     = token
       field-value    = *( field-content | LWS )
       field-content  = <the OCTETs making up the field-value
                        and consisting of either *TEXT or combinations
                        of token, separators, and quoted-string>

       TEXT           = <any OCTET except CTLs,
                        but including LWS>

   A CRLF is allowed in the definition of TEXT only as part of a header
   field continuation.

   Comments can be included in some HTTP header fields by surrounding
   the comment text with parentheses. Comments are only allowed in
   fields containing "comment" as part of their field value definition.
   In all other fields, parentheses are considered part of the field
   value.

       comment        = "(" *( ctext | quoted-pair | comment ) ")"
       ctext          = <any TEXT excluding "(" and ")">

The allowable characters in *TEXT overlaps completely with token,
separators and quoted-string in the allowable characters except that
*TEXT do not allow CTLs other than LWS (HT), and within *TEXT the '\'
character have no special meaning.

Which means that to properly parse '\' quoted constructs one must know
in detail every header processed in order to know if the '\' is quoting
the next character or if it's just a literal '\'.

Because of this it's important that the overall message parsing is the
same regardless if quoted-pair is processed or not, only producing
slightly different results in the raw header value. Or put in other
words, it needs to be possible to completely defer quoting and comment
processing until the header value as such is examined in detail, with
general message parsing using *TEXT for all header values. And for chunk
headers *TEXT minus folding for the general message format, only needing
to dive into quoting etc when eventually processing the chunk extension
values (if at all).


Regarding the allowable characters there imho is absolutely no need to
allow for control characters anywhere in HTTP headers or chunk headers,
quoted or not, and it's additionally very very likely many parsers will
fail on such constructs making them quite non-interoperable.

And additionally if restricting the allowed set of quoted characters to
exclude \x00, NL and CR as already done in HTTPbis then it becomes very
questionable from a technical point of view (ignoring parsing) to allow
the use of other CTLs in quoted form. The use of having CTLs in header
values is very limited to begin with, basically only needed to support
transmission of (non-UTF8) multibyte charactersets or binary non-text
data, in which case having those three excluded is already a signifcant
issue for such use.

So imho quoted-pair should be

    quoted-text = %x09 / %x20-%x7E / obs-text
                ; WSP / VCHAR / obs-text
    quoted-pair = "\" qchar

to match the use of *TEXT in 2616, making comments and quoted strings
all fit within *TEXT as those constructs is only used in detailed forms
which should be a subset of the more generic *TEXT.


This reasoning is also consistent with the current field-content
definition using VTEXT etc..

    field-value    = *( field-content / OWS )
    field-content  = *( WSP / VCHAR / obs-text )

This field-content definition DOES NOT allow for CTLs other than HT.
Allowing quoted-pair to include CTLs other than HT is incompatible with
the above (from latest p1) definition of field-content.

If you look closely you'll notice the quoted-text and field-contents
definitions above are equal. Perhaps a common term should be defined for
that similar to the *TEXT element used in 2616. There is probably more
places where using said term would make sense. And sorry, no I do not
have a good suggested BNF name for this construct.. TEXT would be
confusing with 2616 and text in lower case too generic to be used in
describing text. general-text?

Regards
Henrik



Re: #173: CR and LF in chunk extension values

by Julian Reschke :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Henrik Nordstrom wrote:

> Should probably change topic here, but it's still relevant so keeping
> the issue topic. Most of this is taking a more generic view of
> quoted-pair, not isolated to chunk extension values.
>
> tis 2009-08-25 klockan 09:11 +0200 skrev Julian Reschke:
>
>> quoted-pair is also used in comments. Are we ok with restricting the set
>> here as well? And, if yes, shouldn't we then also adjust the allowed set
>> for non-quoted characters in comments?
>
> What? Restricting how? I thought we were talking about restricting the
> use of CTLs?

Yes. I wanted to confirm that we do that for quoted-strings *and*
comments. Do we?

> Now some further rambling on the use of quoted-pair and the difficulties
> this causes for parsers:
>
>
> qdtext is for text within a quoted-string, and MUST NOT include '"' or
> '\'. Those two must be produced as quoted-pair to be used within a
> quoted-string.
>
>     qdtext         = OWS / %x21 / %x23-5B / %x5D-7E / obs-text
>                    ; OWS / <VCHAR except DQUOTE and "\"> / obs-text
>
> ctext is the same but for comment, and MUST NOT include '(', ')' or '\'.
> Those three must be produced as quoted-pair to be used within a comment.
>
>     ctext          = OWS / %x21-27 / %x2A-5B / %x5D-7E / obs-text
>                    ; OWS / <VCHAR except "(", ")", and "\"> / obs-text
>
> Neither of qdtext or ctext allows for CTLs, except for HT or obsoleted
> CRLF folding (from OWS).

Yes. But quoted-string and comment allow quoted-pair which currently
does allow CTLs.

> Specifications (2616) is very strict on where quoted-pair is alowed to
> be used, but it's at the same time very subtle where those areas are
> creating a large grey area where parsing is somewhat non-obvious.
>
> It's the same question as been raised earlier regarding comments. A
> construct looking like a comment is only a comment if the header in
> question is defined to allow comments, if not it's literally part of the
> header value.
>
> Quoted-string is also only quoted-string if the header in question is
> defined to accept quoted-string, if not it may be a literal part of the
> header value even if it may look like a quoted-string (for a header
> defined as taking *TEXT as value, 2616 has no such headers however)
>
> RFC2616 BNF and relevant comments:
>
>       generic-message = start-line
>                         *(message-header CRLF)
>                         CRLF
>                         [ message-body ]
>        message-header = field-name ":" [ field-value ]
>        field-name     = token
>        field-value    = *( field-content | LWS )
>        field-content  = <the OCTETs making up the field-value
>                         and consisting of either *TEXT or combinations
>                         of token, separators, and quoted-string>
>
>        TEXT           = <any OCTET except CTLs,
>                         but including LWS>
>
>    A CRLF is allowed in the definition of TEXT only as part of a header
>    field continuation.
>
>    Comments can be included in some HTTP header fields by surrounding
>    the comment text with parentheses. Comments are only allowed in
>    fields containing "comment" as part of their field value definition.
>    In all other fields, parentheses are considered part of the field
>    value.
>
>        comment        = "(" *( ctext | quoted-pair | comment ) ")"
>        ctext          = <any TEXT excluding "(" and ")">
>
> The allowable characters in *TEXT overlaps completely with token,
> separators and quoted-string in the allowable characters except that
> *TEXT do not allow CTLs other than LWS (HT), and within *TEXT the '\'
> character have no special meaning.
>
> Which means that to properly parse '\' quoted constructs one must know
> in detail every header processed in order to know if the '\' is quoting
> the next character or if it's just a literal '\'.

Yes.

> Because of this it's important that the overall message parsing is the
> same regardless if quoted-pair is processed or not, only producing
> slightly different results in the raw header value. Or put in other
> words, it needs to be possible to completely defer quoting and comment
> processing until the header value as such is examined in detail, with
> general message parsing using *TEXT for all header values. And for chunk
> headers *TEXT minus folding for the general message format, only needing
> to dive into quoting etc when eventually processing the chunk extension
> values (if at all).
>
>
> Regarding the allowable characters there imho is absolutely no need to
> allow for control characters anywhere in HTTP headers or chunk headers,
> quoted or not, and it's additionally very very likely many parsers will
> fail on such constructs making them quite non-interoperable.

Agreed.

> And additionally if restricting the allowed set of quoted characters to
> exclude \x00, NL and CR as already done in HTTPbis then it becomes very
> questionable from a technical point of view (ignoring parsing) to allow
> the use of other CTLs in quoted form. The use of having CTLs in header
> values is very limited to begin with, basically only needed to support
> transmission of (non-UTF8) multibyte charactersets or binary non-text
> data, in which case having those three excluded is already a signifcant
> issue for such use.

Yes.

> So imho quoted-pair should be
>
>     quoted-text = %x09 / %x20-%x7E / obs-text
>                 ; WSP / VCHAR / obs-text
>     quoted-pair = "\" qchar
>
> to match the use of *TEXT in 2616, making comments and quoted strings
> all fit within *TEXT as those constructs is only used in detailed forms
> which should be a subset of the more generic *TEXT.

"qchar" being...?

> This reasoning is also consistent with the current field-content
> definition using VTEXT etc..
>
>     field-value    = *( field-content / OWS )
>     field-content  = *( WSP / VCHAR / obs-text )
>
> This field-content definition DOES NOT allow for CTLs other than HT.
> Allowing quoted-pair to include CTLs other than HT is incompatible with
> the above (from latest p1) definition of field-content.
>
> If you look closely you'll notice the quoted-text and field-contents
> definitions above are equal. Perhaps a common term should be defined for
> that similar to the *TEXT element used in 2616. There is probably more
> places where using said term would make sense. And sorry, no I do not
> have a good suggested BNF name for this construct.. TEXT would be
> confusing with 2616 and text in lower case too generic to be used in
> describing text. general-text?
> ...

"characters"?

Anyway, my take away from your analysis is: "yes, CTLs need to be
disallowed both in comments and quoted-text", right?

BR, julian



Re: #173: CR and LF in chunk extension values

by Julian Reschke :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Julian Reschke wrote:
> ...

OK, so my understanding is that we disallow all control characters
except HTAB in comment and quoted-string, escaped or not.

Proposed patch:
<http://trac.tools.ietf.org/wg/httpbis/trac/attachment/ticket/173/173.diff>.

Relevant changes in Part 1:

-- snip --

    A string of text is parsed as a single word if it is quoted using
    double-quote marks.

      quoted-string  = DQUOTE *( qdtext / quoted-pair ) DQUOTE
      qdtext         = WSP / %x21 / %x23-5B / %x5D-7E / obs-text
                     ; WSP / <VCHAR except DQUOTE and "\"> / obs-text
      obs-text       = %x80-FF

    The backslash character ("\") can be used as a single-character
    quoting mechanism only within quoted-string and comment constructs:

      quoted-pair    = "\" ( WSP / VCHAR / obs-text )

    Note that quoted-pair includes those characters otherwise disallowed
    in quoted-string or comment (Section 3.2).

...

    Comments can be included in some HTTP header fields by surrounding
    the comment text with parentheses.  Comments are only allowed in
    fields containing "comment" as part of their field value definition.

      comment        = "(" *( ctext / quoted-pair / comment ) ")"
      ctext          = WSP / %x21-27 / %x2A-5B / %x5D-7E / obs-text
                     ; WSP / <VCHAR except "(", ")", and "\"> / obs-text

...

    Rules about implicit linear whitespace between certain grammar
    productions have been removed; now it's only allowed when
    specifically pointed out in the ABNF.  Control characters other than
    HTAB are no longer allowed in comment and quoted-string text (escaped
    or not).  Non-ASCII content in header fields and reason phrase has
    been obsoleted and made opaque (the TEXT rule was removed)
    (Section 1.2.2)

-- snip --

Feedback appreciated,

Julian


Re: #173: CR and LF in chunk extension values

by Henrik Nordstrom-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

tis 2009-08-25 klockan 14:47 +0200 skrev Julian Reschke:

> > So imho quoted-pair should be
> >
> >     quoted-text = %x09 / %x20-%x7E / obs-text
> >                 ; WSP / VCHAR / obs-text
> >     quoted-pair = "\" qchar
> >
> > to match the use of *TEXT in 2616, making comments and quoted strings
> > all fit within *TEXT as those constructs is only used in detailed forms
> > which should be a subset of the more generic *TEXT.
>
> "qchar" being...?

A typo

    quoted-pair = "\" quoted-text

> > If you look closely you'll notice the quoted-text and field-contents
> > definitions above are equal. Perhaps a common term should be defined for
> > that similar to the *TEXT element used in 2616. There is probably more
> > places where using said term would make sense. And sorry, no I do not
> > have a good suggested BNF name for this construct.. TEXT would be
> > confusing with 2616 and text in lower case too generic to be used in
> > describing text. general-text?
> > ...
>
> "characters"?

Is WSP and obs-text characters? Other than that no opinion either way..

> Anyway, my take away from your analysis is: "yes, CTLs need to be
> disallowed both in comments and quoted-text", right?

Yes. CTLs should be disallowed in quoted-pair except for those included
in WSP (HT).

Regards
Henrik



Re: #173: CR and LF in chunk extension values

by Henrik Nordstrom-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

tis 2009-08-25 klockan 15:29 +0200 skrev Julian Reschke:
> Julian Reschke wrote:
> > ...
>
> OK, so my understanding is that we disallow all control characters
> except HTAB in comment and quoted-string, escaped or not.

Yes.

> Proposed patch:
> <http://trac.tools.ietf.org/wg/httpbis/trac/attachment/ticket/173/173.diff>.

>     specifically pointed out in the ABNF.  Control characters other than
>     HTAB are no longer allowed in comment and quoted-string text (escaped
>     or not).

Note: CRLF in the form of obs-fold is still allowed in both, just as it
has always been. It's just quoting using '\' which has been restricted.

> Feedback appreciated,

Looks good to me.

Regards
Henrik



Re: #173: CR and LF in chunk extension values

by Julian Reschke :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Henrik Nordstrom wrote:

> tis 2009-08-25 klockan 15:29 +0200 skrev Julian Reschke:
>> Julian Reschke wrote:
>>> ...
>> OK, so my understanding is that we disallow all control characters
>> except HTAB in comment and quoted-string, escaped or not.
>
> Yes.
>
>> Proposed patch:
>> <http://trac.tools.ietf.org/wg/httpbis/trac/attachment/ticket/173/173.diff>.
>
>>     specifically pointed out in the ABNF.  Control characters other than
>>     HTAB are no longer allowed in comment and quoted-string text (escaped
>>     or not).
>
> Note: CRLF in the form of obs-fold is still allowed in both, just as it
> has always been. It's just quoting using '\' which has been restricted.
>
>> Feedback appreciated,
>
> Looks good to me.
> ...

OK, I have applied the change with
<http://trac.tools.ietf.org/wg/httpbis/trac/changeset/686>.

BR, Julian


Re: #173: CR and LF in chunk extension values

by Henrik Nordstrom-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

tor 2009-08-27 klockan 11:53 +0200 skrev Julian Reschke:

> OK, I have applied the change with
> <http://trac.tools.ietf.org/wg/httpbis/trac/changeset/686>.

Looking again.. and no it's not entirely fine.

ctext and qdtext should not be changed from OWS to WSP. The change is
only in quoted-text. We can not disallow folding here.

Sorry for not seeing this earlier.

Regards
Henrik



Re: #173: CR and LF in chunk extension values

by Julian Reschke :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Henrik Nordstrom wrote:

> tor 2009-08-27 klockan 11:53 +0200 skrev Julian Reschke:
>
>> OK, I have applied the change with
>> <http://trac.tools.ietf.org/wg/httpbis/trac/changeset/686>.
>
> Looking again.. and no it's not entirely fine.
>
> ctext and qdtext should not be changed from OWS to WSP. The change is
> only in quoted-text. We can not disallow folding here.
>
> Sorry for not seeing this earlier.
> ...

It happens; thanks for checking anyway (and this was exactly the reason
I wanted people to verify this change :-).

For now, I undid the change with
<http://trac.tools.ietf.org/wg/httpbis/trac/changeset/687>.

It appears that we *do* have consensus for disallowing controls in
quoted-pairs, thus for:

   quoted-pair    = "\" ( WSP / VCHAR / obs-text )

However, if that's all that we do we won't have addresses issue #173
after all.

Proposal:

- add a new issue for disallowing CTLs in quoted-pair

- address #173 by tuning the definition of chunk-ext-val

BR, Julian


Re: #173: CR and LF in chunk extension values

by Henrik Nordstrom-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

tor 2009-08-27 klockan 14:03 +0200 skrev Julian Reschke:

> It appears that we *do* have consensus for disallowing controls in
> quoted-pairs, thus for:
>
>    quoted-pair    = "\" ( WSP / VCHAR / obs-text )

Yes.

> However, if that's all that we do we won't have addresses issue #173
> after all.

Indeed.

> Proposal:
>
> - add a new issue for disallowing CTLs in quoted-pair

Yes.

> - address #173 by tuning the definition of chunk-ext-val

Which means defining a new variant of quoted-string which do not allow
for folding for use in chunk-ext-val.

    chunk-ext-val    = token / quoted-string-nf
    quoted-string-nf = DQUOTE *( qdtext-nf / quoted-pair ) DQUOTE
    qdtext-nf        = WSP / %x21 / %x23-5B / %x5D-7E / obs-text
                     ; WSP / <VCHAR except DQUOTE and "\"> / obs-text


assuming quoted-pair is fixed as discussed.

Perhaps is should also be noted in text that folding is explicitly forbidden in chunk headers.

Comments are thankfully not allowed in chunk extensions from what I can tell.

Regards
Henrik



Re: #173: CR and LF in chunk extension values

by Roy T. Fielding :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Aug 27, 2009, at 5:03 AM, Julian Reschke wrote:

> It appears that we *do* have consensus for disallowing controls in  
> quoted-pairs, thus for:
>
>   quoted-pair    = "\" ( WSP / VCHAR / obs-text )
>
> However, if that's all that we do we won't have addresses issue  
> #173 after all.
>
> Proposal:
>
> - add a new issue for disallowing CTLs in quoted-pair

I suggest we make the issue "Disallow quoted-pair productions that are
never used in practice nor needed for parsing", with the fix being

    quoted-pair    = "\" ( "\" / DQUOTE / "(" / ")" )

....Roy

< Prev | 1 - 2 - 3 | Next >