(diverting to www-talk, too...)
On 11 Feb 2009, at 01:20, Mark Nottingham wrote:
> Yeah, I'm not completely happy with it yet. The thought was that
> since blank lines don't introduce ambiguity here, they're not
> harmful. OTOH one of my goals for the format is to allow existing
> HTTP header and MIME parsers (e.g., in Python) to be used on the
> format, and they very well may barf on a blank line.
Well, they'll barf on blank lines and declare the header over;
changing that within the parser (or just restarting it on the rest of
the file) should be relatively cheap.
BTW, I notice that this draft is silent on the HTTP header syntax's
combining feature for multiple occurences of the same field (last
paragraph of 4.2, RFC 2616); I suspect that to be one of the more
likely causes for surprises if HTTP header parsers are re-used. (No
such risk with MIME parsers.)
Finally, why disallow whitespace stuffed folding? It's pretty useful
to make long lines editable, and I suspect that we're assuming /host-
meta to be the product of some human with emacs in their hands. ;-)
Implementing it is easy, and a given if existing parsers are used.
> So, the right thing to do might be to explicitly disallow them, both
> in BNF and prose. Eran, thoughts?
I'd just prefer to not have the BNF say "no empty lines", and then
have prose that says the opposite, but with a SHOULD.
>>> 5. Minting New meta-fields
>>
>>> Applications that wish to mint new meta-fields for use in the
>>> host- meta format MUST register them in the host-meta field-
>>> registry, following the procedures in Section 7.2. Field-names
>>> MUST conform to the field-name ABNF Section 3, and field-value
>>> syntax MUST be well- defined (e.g., using ABNF, or a reference to
>>> the syntax of an existing header field-value). Field-values SHOULD
>>> use the ISO-859-1 character encoding. If a field-value applies to
>>> a scope other than the entire authority, that scope MUST be well-
>>> defined.
>>
>> Editorial nit: ISO-8859-1 is missing an 8 here.
>
> That one always gets me, thanks.
>
>> More substantially, is there any particular reason to not just go
>> with utf-8 here? After all, the content type is *appplication*/
>> host-meta anyway.
>
> Same as above; allowing existing parsers and serialisation libraries
> to be used. That said, there have been many arguments in HTTPbis
> that existing libraries won't harm non-ASCII characters in transit,
> but IIRC no one has actually gone out and surveyed what they do...
That suggests that it's a coin toss, unless the mythical "someone"
does that work. May I, in that event, suggest that we use a coin
biased in favor of broader internationalization, i.e., UTF-8?