[Bug 8245] New: [Ser] Error for illegal characters in HTML omits some control characters

View: New views
8 Messages — Rating Filter:   Alert me  

[Bug 8245] New: [Ser] Error for illegal characters in HTML omits some control characters

by Bugzilla from bugzilla@wiggum.w3.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

http://www.w3.org/Bugs/Public/show_bug.cgi?id=8245

           Summary: [Ser] Error for illegal characters in HTML omits some
                    control characters
           Product: XPath / XQuery / XSLT
           Version: Recommendation
          Platform: All
               URL: http://www.w3.org/TR/xslt-xquery-
                    serialization/#HTML_CHARDATA
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Serialization
        AssignedTo: zongaro@...
        ReportedBy: zongaro@...
         QAContact: public-qt-comments@...


According to section 7.3 of Serialization,[1] "Certain characters, specifically
the control characters #x7F-#x9F, are legal in XML but not in HTML. It is a
serialization error [err:SERE0014] to use the HTML output method when such
characters appear in the instance of the data model. The serializer MUST signal
the error."

The definition of the error in appendix B[2] repeats this with a slightly
different formulation:  "It is an error to use the HTML output method when
characters which are legal in XML but not in HTML, specifically the control
characters #x7F-#x9F, appear in the instance of the data model."

It is true that the control characters #x7F through #x9F were the only
characters permitted in XML 1.0 that were not permitted in HTML.  In addition,
the control characters #x01 through #x1F, excepting #x09, #xA and #xD, are
permitted in XML 1.1 (though only as character references), but not in HTML per
the SGML declaration of HTML 4.[3]


I suggest the following corrections:

. In the third paragraph of section 7.3, change "specifically the control
characters #x7F-#x9F, are legal in XML" to "specifically the control characters
#x1-#x8, #xB, #xC, #xE-#x1F and #x7F-#x9F, are legal in one or both versions of
XML, but not in HTML"

. In appendix B, in the definition of err:SER0014, change "specifically the
control characters #x7F-#x9F" to "specifically the control characters #x1-#x8,
#xB, #xC, #xE-#x1F and #x7F-#x9F"


[1] http://www.w3.org/TR/xslt-xquery-serialization/#HTML_CHARDATA
[2] http://www.w3.org/TR/xslt-xquery-serialization/#ERRSERE0014
[3] http://www.w3.org/TR/html401/sgml/sgmldecl.html


--
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.


[Bug 8245] [Ser] Error for illegal characters in HTML omits some control characters

by Bugzilla from bugzilla@wiggum.w3.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

http://www.w3.org/Bugs/Public/show_bug.cgi?id=8245


Michael Kay <mike@...> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mike@...




--- Comment #1 from Michael Kay <mike@...>  2009-11-09 09:30:58 ---
Is this now a complete list? Will it always remain a complete list? Might it
not be better to change the "specifically" to "such as"?


--
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.


[Bug 8245] [Ser] Error for illegal characters in HTML omits some control characters

by Bugzilla from bugzilla@wiggum.w3.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

http://www.w3.org/Bugs/Public/show_bug.cgi?id=8245


Henry Zongaro <zongaro@...> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED




--- Comment #2 from Henry Zongaro <zongaro@...>  2009-11-12 16:35:09 ---
Yes, it's quite possible that an explicit enumeration of characters will become
out of date.  I had worried about that, but I was also concerned that the list
of proscribed characters in HTML is so obscure that simply saying "such as"
wouldn't be of much help to either implementers or users.  (After seven years
of experience with implementing XSLT, it took me about an hour to discover
where the list appears.  I'd like to save others that pain.)

How would you feel about the following proposed edits, which list all the
control characters, while still hedging by using "such as"?

. In the third paragraph of section 7.3, change "specifically the control
characters #x7F-#x9F, are legal in XML" to "such as the control characters
#x1-#x8, #xB, #xC, #xE-#x1F and #x7F-#x9F, are legal in one or both versions of
XML, but not in HTML"

. In appendix B, in the definition of err:SER0014, delete ", specifically the
control characters #x7F-#x9F,"


--
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.


[Bug 8245] [Ser] Error for illegal characters in HTML omits some control characters

by Bugzilla from bugzilla@wiggum.w3.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

http://www.w3.org/Bugs/Public/show_bug.cgi?id=8245





--- Comment #3 from Michael Kay <mike@...>  2009-11-12 16:41:21 ---
That looks fine to me.


--
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.


[Bug 8245] [Ser] Error for illegal characters in HTML omits some control characters

by Bugzilla from bugzilla@wiggum.w3.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

http://www.w3.org/Bugs/Public/show_bug.cgi?id=8245





--- Comment #4 from Henry Zongaro <zongaro@...>  2009-11-26 20:00:29 ---
At its teleconference of 2009-11-12,[4] the WG suggested the wording proposed
in comment #2 should be reworked to make it clear which control characters are
permitted by which version of XML - particularly as many people will not be as
familiar with the XML 1.1 Recommendation.  This is my revised proposal:

. In the third paragraph of section 7.3, change "Certain characters,
specifically the control characters #x7F-#x9F, are legal in XML but not in
HTML." to "Certain characters are legal in XML, but not in HTML -- for example,
the control characters #x7F-#x9F, are legal in both XML 1.0 and XML 1.1, and
the control characters #x1-#x8, #xB, #xC and #xE-#x1F are legal in XML 1.1, but
none of these is permitted in HTML."

. In appendix B, in the definition of err:SER0014, delete ", specifically the
control characters #x7F-#x9F,"


[4] http://lists.w3.org/Archives/Member/w3c-xsl-wg/2009Nov/0028.html
(Member-only link)


--
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.


[Bug 8245] [Ser] Error for illegal characters in HTML omits some control characters

by Bugzilla from bugzilla@wiggum.w3.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

http://www.w3.org/Bugs/Public/show_bug.cgi?id=8245





--- Comment #5 from Henry Zongaro <zongaro@...>  2009-12-02 11:29:58 ---
At the joint teleconference of the XQuery and XSL Working Groups of
2009-12-01,[1]
the proposal in comment #4 was accepted.  As only a few members of the XSL WG
were present on the call, I will bring the proposal back to that working group
for final ratification.

[5] http://lists.w3.org/Archives/Member/w3c-xsl-query/2009Dec/0005.html
(Member-only link)


--
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.


[Bug 8245] [Ser] Error for characters that are not permitted in HTML omits some control characters

by Bugzilla from bugzilla@wiggum.w3.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

http://www.w3.org/Bugs/Public/show_bug.cgi?id=8245


Henry Zongaro <zongaro@...> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[Ser] Error for illegal     |[Ser] Error for characters
                   |characters in HTML omits    |that are not permitted in
                   |some control characters     |HTML omits some control
                   |                            |characters




--- Comment #6 from Henry Zongaro <zongaro@...>  2009-12-02 18:57:24 ---
[Revising the abstract.]


--
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.


[Bug 8245] [Ser] Error for characters that are not permitted in HTML omits some control characters

by Bugzilla from bugzilla@wiggum.w3.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

http://www.w3.org/Bugs/Public/show_bug.cgi?id=8245


Henry Zongaro <zongaro@...> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED




--- Comment #7 from Henry Zongaro <zongaro@...>  2009-12-03 19:58:36 ---
At its teleconference of 2009-12-03,[6] the XSL Working Group ratified the
decision reported in comment #5.

This will be Serialization erratum SE.E15.

[6] http://lists.w3.org/Archives/Member/w3c-xsl-wg/2009Dec/0008.html


--
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.