IRI issues (in quite some detail)

View: New views
2 Messages — Rating Filter:   Alert me  

IRI issues (in quite some detail)

by "Martin J. Dürst" :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

This is a laundry list of issues that have come up on the IRI spec
update. They are grouped into things that are related where possible. I
hope this is a fairly complete initial pass, but I'm sure there are
still a few things missing.

In your replies, please distinguish addition of issues from discussion
of specific issues.



IRIs and IDNA
=============
- %encoding vs. punycode when converting from IRI to URI
   (see mail by Roy:
    http://lists.w3.org/Archives/Public/public-iri/2009Aug/0010.html
    and I-D by Dave Thaler:
    http://tools.ietf.org/html/draft-iab-idn-encoding)

- Update of Bidi section:
   - allow combining marks at end of component
   - adopt component restrictions to those in [IDNA-Bidi]
   - check about other syntactic characters (not only dot)
     and payload characters (e.g. %)
   [- rework examples]

- IDNA 2003 vs. IDNA 2008:
   - to map or not to map for IRI->URI and on resolution in general
     - what mapping to use (see http://www.unicode.org/reports/tr46/
       for a potential direction)
     - what to do about ß (sharp s) and ς (final sigma)
       - short term
       - long term
   - advice for authors:
     - Always use prepped (in IDNA 2003 termiology) or
       legal U-Label (in IDNA 2008 terminology)
     - Avoid separators other than '.'
     - Avoid IDNs that are not legal in either IDNA 2003 or 2008 ?


LEIRIs and HTML5 references
===========================

- Are there other "main areas" (like XML and HTML) that warrant similar
   'preferential treatment' [let's really hope not] (see also
   http://www.w3.org/International/iri-edit/spec-use-survey.html
   (way incomplete))

- Naming these explicitly (or not)
   - What's the best name for HTML5 references

- Using syntax or procedure for definition
   (syntax seems to work better for the requirements of XML and LEIRIs,
    procedure may work better for HTML5)

- Place in spec: Appendix? Separate section (for each, or for both
   together?)? As part of a section 5 (Normalization and Comparison;
   probably not, seems confusing to many people)

- Mix with main IRI->URI procedure or not (ideally separate, but may
   not be easy for some aspects)

- What to keep in 'host' specs (e.g. definition of whitespace?)


HTML5 reference specific issues
===============================

- '\' as path separator

- '#' in fragment identifiers

- '[' and ']' other than for IPv6 literals

- Processing of other characters not allowed

- treatment of lonely '%' (not followed by 2 hex digits)

- special behavior for encoding in http: and https: query parts
   (use document encoding if available instead of UTF-8)

- some more (to be completed, including pointer to relevant documents
(from Anne)

- How to advise authors,... against using 'bugwards-compatible' features
   (completed for LEIRIs, needs to be discussed and done for HTML5)


IRI issues
==========
(at http://www.w3.org/International/iri-edit/,
not already mentioned above)
- http://www.w3.org/International/iri-edit/#identity-101
- http://www.w3.org/International/iri-edit/#transcodeNFC-103


Registration issues
===================

- Allow definition of URI schemes simply in terms of IRIs?

- What other adjustments needed resulting from issues above?


Issues for individual schemes
=============================

- Piggibacking mailto:
   - Allowing UTF-8 officially where current email infrastructure
     does allow it
   - Fixing other issues in mailto:

- Updating mailto: for EAI (or creating a new scheme)

- Others?


URI issues (potentially)?
==========
- do '[' and ']' need to be forbidden in URIs
- does '#' need to be forbidden in URI fragment parts


Regards,   Martin.

--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@...


Re: IRI issues (in quite some detail)

by Bjoern Hoehrmann :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

* Martin J. Dürst wrote:
>URI issues (potentially)?
>==========
>- do '[' and ']' need to be forbidden in URIs
>- does '#' need to be forbidden in URI fragment parts

I do not think these are worth considering, there are existing technolo-
gies that use them precisely because they have been forbidden to distin-
guish resource identifiers from other things in protocol elements, e.g.
XML Schema uses a "##identifier" syntax and "CURIEs" use "[identifier]".
Retroactively allowing them would do little more than cause confusion.
--
Björn Höhrmann · mailto:bjoern@... · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/