Re: Definitions limit on label length in UTF-8

View: New views
4 Messages — Rating Filter:   Alert me  

Parent Message unknown Re: Definitions limit on label length in UTF-8

by "Martin J. Dürst" :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello John,

[Dave, this is Cc'ed to you because of some discussion relating to
draft-iab-idn-encoding-00.txt.]

[I'm also cc'ing public-iri@... because of the IRI-related issue at
the end.]

[Everybody, please remove the Cc fields when they are unnecessary.]


Overall, I'm afraid that on this issue, more convoluted explanations
won't convince me nor anybody else, but I'll nevertheless try to answer
your discussion below point-by-point.

What I (and I guess others on this list) really would like to know is
whether you have any CONCRETE reports or evidence regarding problems
with IDN labels that are longer than 63 octets when expressed in UTF-8.

Otherwise, Michel has put it much better than me: "given the lack of
issues with IDNA2003 on that specific topic there are no reasons to
introduce an incompatible change".


On 2009/09/12 0:47, John C Klensin wrote:

>
> --On Friday, September 11, 2009 17:37 +0900 "\"Martin J.
> Dürst\""<duerst@...>  wrote:
>
>>> (John claimed that the email context required such a
>>> rule, but I did not bother to confirm that.)
>> Given dinosaur implementations such as sendmail, I can
>> understand the concern that some SMTP implementations may not
>> easily be upgradable to use domain names with more than 255
>> octets or labels with more than 63 octets. In than case, I
>> would have expected at least a security warning at
>> http://tools.ietf.org/html/rfc4952#section-9 (EAI is currently
>> written in terms of IDNA2003, and so there are no length
>> restrictions on U-labels).
>
> I obviously have not been explaining this very well.  The
> problem is not "dinosaur implementations"

Okay, good.

> but a combination of
> two things (which interact):
>
> (1) Late resolution of strings, possibly through APIs that
> resolve names in places that may not be the public DNS.
> Systems using those APIs may keep strings in UTF-8 until very
> late in the process, even passing the UTF-8 strings into the
> interface or converting them to ACE form just before calling the
> interface.  Either way, because other systems have come to rely
> on the 63 octet limit, strings longer than 63 characters pose a
> risk of unexpected problems.  The issues with this are better
> explained in draft-iab-idn-encoding-00.txt, which I would
> strongly encourage people in this WG to go read.

I have indeed read draft-iab-idn-encoding-00.txt (I sent comments to the
author and the IAB and copied this list). That document mentions the
length restrictions, as essentially the only restrictions in DNS itself,
rather than in things on top of it. That document also (well, mainly)
discusses the issue of names being handed down into APIs in various
forms (UTF-8, UTF-16, punycode, legacy encodings,...), and being
resolved by various mechanisms (DNS, NetBIOS, mDNS, hosts file,...), and
the problem that these mechanisms may use and expect different encodings
for non-ASCII characters.

However, I haven't found any mention, nor even a hint, in that document,
of a need to restrict punycode labels to less than 63 octets when
expressed in UTF-8.

The document mentions (as something that might happen, but shouldn't)
that an application may pass a UTF-8 string to something like
getaddrinfo, and that string may be passed directly to the DNS. First,
if this happens, IDNA has already lost. Second, whether the string is
UTF-8 or pure ASCII, if the API isn't prepared to handle labels longer
than 63 octets and overall names longer than 255 octets defensively
(i.e. return something like 'not found'), then the programmer should be
fired. Anyway, in that case, the problem isn't with UTF-8.

What draft-iab-idn-encoding-00.txt essentially points out is that
different name resolution services use different encodings for non-ASCII
characters, and that currently different users (meaning applications) of
a name resolution API may assume different encodings for non-ASCII
characters, which creates all kinds of chances for errors. Some
heuristics may help in some cases, but the right solution (as with all
cases where characters, and in particular non-ASCII ones, are involved)
is to clearly say where which encoding is used. A very simple example
for this is GetAddInfoW, which assumes UTF-16.

The only potential problem that I see from the discussion in
draft-iab-idn-encoding-00.txt is the following: Some labels containing
non-ASCII characters that fit into 63 octets in punycode and therefore
can be resolved with the DNS may not be resolvable with some other
resolution service because that service may use a different encoding
(and may or may not have different length limits).

I have absolutely nothing against some text in a Security Considerations
section or in Rationale pointing out that if you want to set up some
name or label for resolution via multiple different resolution services,
you have to take care that you choose your names and labels so that they
meet the length restrictions for all those services. But that doesn't
imply at all that we have to artificially restrict the length of
punycode labels by counting octets in UTF-8.


> (2) The "conversion of DNS name formats" issue that has been
> extensively discussed as part of the question of alternate label
> separators (sometimes described in our discussions as
> "dot-oids").  Applications that use domain names, including
> domain names that are not going to be resolved (or even looked
> up), must be able to freely and accurately converted between
> DNS-external (dot-separated labels) and DNS-internal
> (length-string pairs) formats _without_ knowing whether they are
> IDNs or not.

I'm not exactly sure what you mean here. If you want to say "without
checking whether they contain xn-- prefixes and punycode or not", then I
can agree, but that cannot motivate a UTF-8 based length restriction.

If you say that applications, rather than first converting U-label ->
A-label and then converting from dot-separated to length-string
notation, have to be able to first convert to length-string notation and
then convert U-labels to A-labels, then I contend that nobody in their
right mind would do it that way, and even less if "dot-oids" are
involved. For a starter, U-labels don't have a fixed encoding.

> As discussed earlier, one of several reasons for
> that requirement is that, in non-IDNA-aware contexts, labels in
> non-IDNA-aware applications or contexts may be perfectly valid
> as far as the DNS is concerned, because the only restriction the
> DNS (and the normal label type) imposes is "octets".

If and where somebody has binary labels, of course these binary labels
must not be longer than 63 octets. But IDNA doesn't use binary labels,
and doesn't stuff UTF-8 into DNS protocol slots, so for IDNA, any length
restrictions on UTF-8 are irrelevant.

> That
> length-string format has a hard limit of 63 characters that can
> be exceeded only if one can figure out how to get a larger
> number into six bits (see RFC1035, first paragraph of Section
> 3.1, and elsewhere).

I very well know that the 63 octets (not characters) limit is a hard
one. In the long run, one might imagine an extension to DNS that uses
another label format, without this limitation, but there is no need at
all to go there for this discussion.

> If we permit longer U-label strings on the
> theory that the only important restriction is on A-labels, we
> introduce new error states into the format conversion process.

For IDNA, only A-labels get sent through the DNS protocol, so only
there, the length restrictions for labels is relevant. If somebody gets
this wrong in the format conversion process (we currently don't have any
reports on that), then that's their problem (and we can point it out in
a Security section or so).

> If this needs more explanation somewhere (possibly in
> Rationale), I'm happy to try to do that.  But I think
> eliminating the restriction would cause far more problems than
> it is worth.

It hasn't caused ANY problems in IDNA2003. There is nothing new in
IDNA2008 that would motivate a change. *Running code*, one of the
guidelines of the IETF, shows that the restriction is unnecessary.


> I note that, while I haven't had time to respond, some of the
> discussion on the IRI list has included an argument that domain
> names in URIs cannot be restricted to A-label forms but must
> include %-escaped UTF-8 simply because those strings might not
> be public-DNS domain names but references to some other database
> or DNS environment.

It's not 'simply because'. It's first and foremost because of the
syntactic uniformity of URIs, and the fact that it's impossible to
identify all domain names in an URI (the usual slot after the '//' is
easy, scheme-specific processing (which is not what URIs and IRIs are
about) may be able to deal with some of 'mailto', but what do you do
about domain names in query parts? Also, this syntax is part of RFC
3986, STD 66, a full IETF Standard.

Overall, it's just a question of what escaping convention should be
used. URIs have their specific escaping convention (%-encoding), and DNS
has its specific escaping convention (punycode).

Also please note that the IRI spec doesn't prohibit to use punycode when
converting to URIs.

In addition, please note that at least my personal implementation
experience (adding IDN support to Amaya) shows that the overhead of
supporting %-encoding in domain names in URIs is minimal, and helps
streamline the implementation.

> It seems to me that one cannot have it
> both ways -- either the application knows whether a string is a
> public DNS reference that must conform _only_ to IDNA
> requirements (but then can be restricted to A-labels) or the
> application does not know and therefore must conform to DNS
> requirements for label lengths.

There is absolutely no need to restrict *all* references just because
*some of them* may use other resolver systems with other length
restrictions (which may be "63 octets per label when measured in UTF-8"
or something completely different). It would be very similar to saying
"Some compilers/linkers can only deal with identifiers 6 characters or
shorter, so all longer identifiers are prohibited."

> For our purposes, the only
> sensible way, at least IMO, to deal with this is to require
> conformance to both sets of rules, i.e., 63 character maximum
> for A-labels and 63 character maximum for U-labels.

As far as I understand punycode, it's impossible to encode a Unicode
character in less than one octet. This means that a maximum of 63
*characters* for U-labels is automatically guaranteed by a maximum of 63
characters/octets for A-labels.

However, Defs clearly says "length in octets of the UTF-8 form", so I
guess this was just a slip of your fingers.

Regards,    Martin.

--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@...


Re: Definitions limit on label length in UTF-8

by "Martin J. Dürst" :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some additional points below.

On 2009/09/12 12:14, Martin J. Dürst wrote:
> Hello John,
> On 2009/09/12 0:47, John C Klensin wrote:
>>
>> --On Friday, September 11, 2009 17:37 +0900 "\"Martin J.
>> Dürst\""<duerst@...> wrote:

>> I note that, while I haven't had time to respond, some of the
>> discussion on the IRI list has included an argument that domain
>> names in URIs cannot be restricted to A-label forms but must
>> include %-escaped UTF-8 simply because those strings might not
>> be public-DNS domain names but references to some other database
>> or DNS environment.
>
> It's not 'simply because'. It's first and foremost because of the
> syntactic uniformity of URIs, and the fact that it's impossible to
> identify all domain names in an URI (the usual slot after the '//' is
> easy, scheme-specific processing (which is not what URIs and IRIs are
> about) may be able to deal with some of 'mailto', but what do you do
> about domain names in query parts? Also, this syntax is part of RFC
> 3986, STD 66, a full IETF Standard.

Also, consider EAI (email address internationalization) and mailto: (or
something like 'imailto:' if we go with a separate scheme name for
internationalized addresses). If we use scheme-specific processing, we
can convert IDN labels to punycode, but for EAI, that would be useless
overkill, because EAI uses UTF-8. This works much better if we use
%-encoding for the whole IRI->URI conversion than if we try to be 'smart'.


> Overall, it's just a question of what escaping convention should be
> used. URIs have their specific escaping convention (%-encoding), and DNS
> has its specific escaping convention (punycode).
>
> Also please note that the IRI spec doesn't prohibit to use punycode when
> converting to URIs.
>
> In addition, please note that at least my personal implementation
> experience (adding IDN support to Amaya) shows that the overhead of
> supporting %-encoding in domain names in URIs is minimal, and helps
> streamline the implementation.
>
>> It seems to me that one cannot have it
>> both ways -- either the application knows whether a string is a
>> public DNS reference that must conform _only_ to IDNA
>> requirements (but then can be restricted to A-labels) or the
>> application does not know and therefore must conform to DNS
>> requirements for label lengths.
>
> There is absolutely no need to restrict *all* references just because
> *some of them* may use other resolver systems with other length
> restrictions (which may be "63 octets per label when measured in UTF-8"
> or something completely different). It would be very similar to saying
> "Some compilers/linkers can only deal with identifiers 6 characters or
> shorter, so all longer identifiers are prohibited."

In addition, for IDNA2003 (which we are using for implementation
experience), a label being in UTF-8 means that it may not yet have been
nameprepped. That in turn implies that it may contain non-NFKC
characters, which take more or less space than the nameprepped version
of UTF-8. If there were indeed implementations that did conversion to
lenght-string pairs in UTF-8 and only later applied punycode, there
could be cases where an IDN label may or may not resolve depending on
whether input was normalized or not. So it could e.g. resolve on a Linux
or Windows system (these use precomposed characters mostly identical to
NFC), but not resolve on a Mac (which uses decomposed characters, taking
more space). Weird and improbable.


Regards,   Martin.

--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@...


RE: Definitions limit on label length in UTF-8

by Dave Thaler-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Martin J. Dürst writes:

> Hello John,
>
> [Dave, this is Cc'ed to you because of some discussion relating to
> draft-iab-idn-encoding-00.txt.]
>
> [I'm also cc'ing public-iri@... because of the IRI-related issue at
> the end.]
>
> [Everybody, please remove the Cc fields when they are unnecessary.]
>
>
> Overall, I'm afraid that on this issue, more convoluted explanations
> won't convince me nor anybody else, but I'll nevertheless try to answer
> your discussion below point-by-point.
>
> What I (and I guess others on this list) really would like to know is
> whether you have any CONCRETE reports or evidence regarding problems
> with IDN labels that are longer than 63 octets when expressed in UTF-8.
>
> Otherwise, Michel has put it much better than me: "given the lack of
> issues with IDNA2003 on that specific topic there are no reasons to
> introduce an incompatible change".
>
>
> On 2009/09/12 0:47, John C Klensin wrote:
> >
> > --On Friday, September 11, 2009 17:37 +0900 "\"Martin J.
> > Dürst\""<duerst@...>  wrote:
> >
> >>> (John claimed that the email context required such a
> >>> rule, but I did not bother to confirm that.)
> >> Given dinosaur implementations such as sendmail, I can
> >> understand the concern that some SMTP implementations may not
> >> easily be upgradable to use domain names with more than 255
> >> octets or labels with more than 63 octets. In than case, I
> >> would have expected at least a security warning at
> >> http://tools.ietf.org/html/rfc4952#section-9 (EAI is currently
> >> written in terms of IDNA2003, and so there are no length
> >> restrictions on U-labels).
> >
> > I obviously have not been explaining this very well.  The
> > problem is not "dinosaur implementations"
>
> Okay, good.
>
> > but a combination of
> > two things (which interact):
> >
> > (1) Late resolution of strings, possibly through APIs that
> > resolve names in places that may not be the public DNS.
> > Systems using those APIs may keep strings in UTF-8 until very
> > late in the process, even passing the UTF-8 strings into the
> > interface or converting them to ACE form just before calling the
> > interface.  Either way, because other systems have come to rely
> > on the 63 octet limit, strings longer than 63 characters pose a
> > risk of unexpected problems.  The issues with this are better
> > explained in draft-iab-idn-encoding-00.txt, which I would
> > strongly encourage people in this WG to go read.

Actually systems using those APIs which are the "standard"
(with a lower case s) APIs, may keep strings in UTF-8 (or even
UTF-16 for common but non-"standard" variants) until very late, and
may keep strings in UTF-8 without ever converting them for some
protocols, e.g. mDNS, that are defined to use UTF-8.

> I have indeed read draft-iab-idn-encoding-00.txt (I sent comments to
> the
> author and the IAB and copied this list). That document mentions the
> length restrictions, as essentially the only restrictions in DNS
> itself,
> rather than in things on top of it. That document also (well, mainly)
> discusses the issue of names being handed down into APIs in various
> forms (UTF-8, UTF-16, punycode, legacy encodings,...), and being
> resolved by various mechanisms (DNS, NetBIOS, mDNS, hosts file,...),
> and
> the problem that these mechanisms may use and expect different
> encodings
> for non-ASCII characters.
>
> However, I haven't found any mention, nor even a hint, in that
> document,
> of a need to restrict punycode labels to less than 63 octets when
> expressed in UTF-8.

I agree with the above characterization.


> The document mentions (as something that might happen, but shouldn't)
> that an application may pass a UTF-8 string to something like
> getaddrinfo, and that string may be passed directly to the DNS. First,
> if this happens, IDNA has already lost.

I'm don't agree with the "shouldn't", and certainly it was not
the intent of draft-iab-idn-encoding-00.txt to actually state
whether this "shouldn't" happen, but that it "can" happen
(and perhaps "does").  There's also a potential argument in the doc
that this is not harmful (see 2nd paragraph of section 4 for
instance, and extrapolate from there).


> Second, whether the string is
> UTF-8 or pure ASCII, if the API isn't prepared to handle labels longer
> than 63 octets and overall names longer than 255 octets defensively
> (i.e. return something like 'not found'), then the programmer should be
> fired. Anyway, in that case, the problem isn't with UTF-8.
>
> What draft-iab-idn-encoding-00.txt essentially points out is that
> different name resolution services use different encodings for non-
> ASCII
> characters, and that currently different users (meaning applications)
> of
> a name resolution API may assume different encodings for non-ASCII
> characters, which creates all kinds of chances for errors. Some
> heuristics may help in some cases, but the right solution (as with all
> cases where characters, and in particular non-ASCII ones, are involved)
> is to clearly say where which encoding is used. A very simple example
> for this is GetAddInfoW, which assumes UTF-16.
>
> The only potential problem that I see from the discussion in
> draft-iab-idn-encoding-00.txt is the following: Some labels containing
> non-ASCII characters that fit into 63 octets in punycode and therefore
> can be resolved with the DNS may not be resolvable with some other
> resolution service because that service may use a different encoding
> (and may or may not have different length limits).
>
> I have absolutely nothing against some text in a Security
> Considerations
> section or in Rationale pointing out that if you want to set up some
> name or label for resolution via multiple different resolution
> services,
> you have to take care that you choose your names and labels so that
> they
> meet the length restrictions for all those services. But that doesn't
> imply at all that we have to artificially restrict the length of
> punycode labels by counting octets in UTF-8.

Completely agree with all of the above.  I think a brief discussion of
this issue may make sense in the next version of draft-iab-idn-encoding,
if we can get IAB consensus on text.


> > (2) The "conversion of DNS name formats" issue that has been
> > extensively discussed as part of the question of alternate label
> > separators (sometimes described in our discussions as
> > "dot-oids").  Applications that use domain names, including
> > domain names that are not going to be resolved (or even looked
> > up), must be able to freely and accurately converted between
> > DNS-external (dot-separated labels) and DNS-internal
> > (length-string pairs) formats _without_ knowing whether they are
> > IDNs or not.
>
> I'm not exactly sure what you mean here. If you want to say "without
> checking whether they contain xn-- prefixes and punycode or not", then
> I
> can agree, but that cannot motivate a UTF-8 based length restriction.

Right.  I'm not sure why most "applications" would care about DNS-
internal (length-string pairs) formats, only NULL-terminated
strings (containing dot-separated labels) that get passed to
getaddrinfo-like functions.  Most applications are (and should be)
oblivious to the fact that DNS or some other protocol is used for
resolving names.


> If you say that applications, rather than first converting U-label ->
> A-label and then converting from dot-separated to length-string
> notation, have to be able to first convert to length-string notation
> and
> then convert U-labels to A-labels, then I contend that nobody in their
> right mind would do it that way, and even less if "dot-oids" are
> involved. For a starter, U-labels don't have a fixed encoding.
>
> > As discussed earlier, one of several reasons for
> > that requirement is that, in non-IDNA-aware contexts, labels in
> > non-IDNA-aware applications or contexts may be perfectly valid
> > as far as the DNS is concerned, because the only restriction the
> > DNS (and the normal label type) imposes is "octets".
>
> If and where somebody has binary labels, of course these binary labels
> must not be longer than 63 octets. But IDNA doesn't use binary labels,
> and doesn't stuff UTF-8 into DNS protocol slots, so for IDNA, any
> length
> restrictions on UTF-8 are irrelevant.
>
> > That
> > length-string format has a hard limit of 63 characters that can
> > be exceeded only if one can figure out how to get a larger
> > number into six bits (see RFC1035, first paragraph of Section
> > 3.1, and elsewhere).
>
> I very well know that the 63 octets (not characters) limit is a hard
> one. In the long run, one might imagine an extension to DNS that uses
> another label format, without this limitation, but there is no need at
> all to go there for this discussion.
>
> > If we permit longer U-label strings on the
> > theory that the only important restriction is on A-labels, we
> > introduce new error states into the format conversion process.
>
> For IDNA, only A-labels get sent through the DNS protocol, so only
> there, the length restrictions for labels is relevant. If somebody gets
> this wrong in the format conversion process (we currently don't have
> any
> reports on that), then that's their problem (and we can point it out in
> a Security section or so).
>
> > If this needs more explanation somewhere (possibly in
> > Rationale), I'm happy to try to do that.  But I think
> > eliminating the restriction would cause far more problems than
> > it is worth.
>
> It hasn't caused ANY problems in IDNA2003. There is nothing new in
> IDNA2008 that would motivate a change. *Running code*, one of the
> guidelines of the IETF, shows that the restriction is unnecessary.
>
>
> > I note that, while I haven't had time to respond, some of the
> > discussion on the IRI list has included an argument that domain
> > names in URIs cannot be restricted to A-label forms but must
> > include %-escaped UTF-8 simply because those strings might not
> > be public-DNS domain names but references to some other database
> > or DNS environment.
>
> It's not 'simply because'. It's first and foremost because of the
> syntactic uniformity of URIs, and the fact that it's impossible to
> identify all domain names in an URI (the usual slot after the '//' is
> easy, scheme-specific processing (which is not what URIs and IRIs are
> about) may be able to deal with some of 'mailto', but what do you do
> about domain names in query parts? Also, this syntax is part of RFC
> 3986, STD 66, a full IETF Standard.
>
> Overall, it's just a question of what escaping convention should be
> used. URIs have their specific escaping convention (%-encoding), and
> DNS
> has its specific escaping convention (punycode).
>
> Also please note that the IRI spec doesn't prohibit to use punycode
> when
> converting to URIs.
>
> In addition, please note that at least my personal implementation
> experience (adding IDN support to Amaya) shows that the overhead of
> supporting %-encoding in domain names in URIs is minimal, and helps
> streamline the implementation.
>
> > It seems to me that one cannot have it
> > both ways -- either the application knows whether a string is a
> > public DNS reference that must conform _only_ to IDNA
> > requirements (but then can be restricted to A-labels) or the
> > application does not know and therefore must conform to DNS
> > requirements for label lengths.
>
> There is absolutely no need to restrict *all* references just because
> *some of them* may use other resolver systems with other length
> restrictions (which may be "63 octets per label when measured in UTF-8"
> or something completely different). It would be very similar to saying
> "Some compilers/linkers can only deal with identifiers 6 characters or
> shorter, so all longer identifiers are prohibited."

I agree with that.


> > For our purposes, the only
> > sensible way, at least IMO, to deal with this is to require
> > conformance to both sets of rules, i.e., 63 character maximum
> > for A-labels and 63 character maximum for U-labels.
>
> As far as I understand punycode, it's impossible to encode a Unicode
> character in less than one octet. This means that a maximum of 63
> *characters* for U-labels is automatically guaranteed by a maximum of
> 63
> characters/octets for A-labels.
>
> However, Defs clearly says "length in octets of the UTF-8 form", so I
> guess this was just a slip of your fingers.
>
> Regards,    Martin.

-Dave



Re: Definitions limit on label length in UTF-8

by John C Klensin-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Martin,

First of all, please understand that I'm much more agnostic on
this issue than I think you assume.  I'm trying to reflect what
I believe I've been told by the WG and by various other
communities on the subject but, if the WG says "change it", I
will do so as editor and lose very little sleep about the
subject.

I'll let Dave and Stuart address the API and eventual migration
to pure UTF-8 issues.  I've been told that the ability to
convert to length-value form (with a six-bit length) _before_
Punycode conversion (or in an IDNA-unaware, "octets only"
implementation) is critical for the DNS community and for some
security-related applications which store DNS-based identifiers
in that form.  But I have no personal implementation experience
in either area, so perhaps Andrew and Paul can either speak to
those issues or point us to someone who can.

As a sometime-implementer, I'm nervous about unlimited-length
strings (as, based on recent interactions, are Stuart and Vint).
But it seems to me that the string length here is bounded in any
event -- with 59 characters of Punycode in an A-label, the upper
limit on a UTF-8 or UTF-32 string cannot be over 236 characters
and, I assume, would be considerably smaller.  Especially if we
can pin that number down (Adam?), I'd be a lot happier with text
that said, essentially, "the limit is on the A-label string, but
implementations should be aware that a maximum-length A-label
can convert to a U-label of up to NNN" characters than saying
"unlimited" and I think some others would be too.

All of that said, I'm not persuaded by the "there have been no
issues raised, therefore there is no problem" argument.  The
reality is that, for mnemonic and typing convenience, people
generally prefer shorter labels to longer ones.  Other than in
test demonstrations and as part of efforts to encode other types
of information in DNS labels, I don't believe I've ever seen a
60+ character ASCII label in the wild.  Regardless of script, a
few such labels in the same FQDN would not only be nearly
impossible for most people to enter correctly but also would
guarantee line-wrapping of DNS names in most screen-layout and
documentation arrangements... never an ideal situation.   That
isn't an argument for banning labels of that length or longer;
it does suggest a reason why no problems have been identified
other than "people have been using this for years with no
difficulty".

regards,
    john



--On Saturday, September 12, 2009 12:14 +0900 "\"Martin J.
Dürst\"" <duerst@...> wrote:

> Hello John,
>
> [Dave, this is Cc'ed to you because of some discussion
> relating to draft-iab-idn-encoding-00.txt.]
>
> [I'm also cc'ing public-iri@... because of the IRI-related
> issue at the end.]
>
> [Everybody, please remove the Cc fields when they are
> unnecessary.]
>
>
> Overall, I'm afraid that on this issue, more convoluted
> explanations won't convince me nor anybody else, but I'll
> nevertheless try to answer your discussion below
> point-by-point.
>
> What I (and I guess others on this list) really would like to
> know is whether you have any CONCRETE reports or evidence
> regarding problems with IDN labels that are longer than 63
> octets when expressed in UTF-8.
>
> Otherwise, Michel has put it much better than me: "given the
> lack of issues with IDNA2003 on that specific topic there are
> no reasons to introduce an incompatible change".