Illegal HTML decimal 149 and 150?

View: New views
6 Messages — Rating Filter:   Alert me  

Illegal HTML decimal 149 and 150?

by Dan Vint :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I've got a stylesheet that was developed and used under saxon 6.5 and set
to v1.1 of xslt. I decided to start running against Saxon 9.1. There were a
bunch of issues that were caught/flagged when I made the switch and also
set the xslt version to 2.0. Most of these I understood as issues with the
XSLT syntax.

I'm puzzeled by one of them. So in the original stylesheet I had these
lines:

<?xml version="1.0"?>
<xsl:stylesheet version="1.1"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="yes"/>

<xsl:template match="ipb">
     <xsl:variable name="ipbcounter">
        <xsl:number format="001" level="multiple" count="ipb"/>
     </xsl:variable>
     <xsl:variable name="ipb_num"
          select="concat('./Output/IPB/f21', $ipbcounter, '.html')"/>
     <xsl:document href="{$ipb_num}" method="html">
         <html>
            <head>
              <title><xsl:value-of select="./table/title"/></title>
  <link rel="stylesheet" type="text/css" href="../StyleSheets/Mk45.css"/>
    </head>
            <body link="#0000ff">
               <xsl:apply-templates select="./table"/>
            </body>
  </html>
     </xsl:document>
</xsl:template>

<xsl:template match="ipbillus">
    <xsl:variable name="sht_num" select="./xref/@shtref"/>
    <xsl:variable name="board_num" select="unparsed-entity-uri(ancestor::ipb/figure/subfig/graphic[@boardno=$sht_num]/@boardno)"/>
    <xsl:variable name="fig_num" select="substring-after($board_num, 'Source/')"/>
  •– <a id="{@id}"
href="javascript:OpenIPBFlash('{substring-before($fig_num, '.')}')">
    <xsl:number format="1" level="any" count="ipbillus | ipbnonillus" from="ipb"/></a>
</xsl:template>

.... other stuff removed

The problem occurs with •– . In Saxon 6 no error is raised,
but with Saxon 9 I get
Error at xsl:text on line 496 of Mk45_master.xsl:
   SERE0014: Illegal HTML character: decimal 149

I'm curious why. I thought it might be an encoding issue. In the generated
HTML I get the entity reference and I see this is changed to nbsp for the
160
<td align="" valign="top" border="2">
  •– <a id="idx18427" href="javascript:OpenIPBFlash('')">1</a>
                       
</td>

I changed this string to be •–  which is generating the
UTF-8 codes (instead of entity references). All of this looks fine in the
HTML files. But a second part tool used for full text indexing doesn't
handle the entity reference the same way it does the UTF-8 values.

Can you explain why the decimal 149 is illegal in this case? I tried
searching around for an explanation of why it works in the HTML but is not
allowed in XSLT and didn't find any leads.

Ultimately my problem is with the full text engine, but I was surprised at
the changed results.

..dan

---------------------------------------------------------------------------
Danny Vint

Specializing in Panoramic Images of California and the West
http://www.dvint.com

Voice:510:522-4703
FAX: 801-749-3229

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@...
https://lists.sourceforge.net/lists/listinfo/saxon-help 

Re: Illegal HTML decimal 149 and 150?

by Martin Honnen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dan Vint wrote:

> The problem occurs with •– . In Saxon 6 no error is raised,
> but with Saxon 9 I get
> Error at xsl:text on line 496 of Mk45_master.xsl:
>    SERE0014: Illegal HTML character: decimal 149
>
> I'm curious why.

See http://www.w3.org/TR/xslt-xquery-serialization/#HTML_CHARDATA, it says:
   "Certain characters, specifically the control characters #x7F-#x9F,
are legal in XML but not in HTML. It is a serialization error
[err:SERE0014] to use the HTML output method when such characters appear
in the instance of the data model. The serializer MUST signal the error."

Unicode character #149 is #x95 so it is control character not allowed in
HTML and the XSLT 2.0/XQuery 1.0 serialization is stricter than XSLT 1.0
serialization it seems.

I can't point you to the part of the HTML 4 specification excluding
those characters (I suspect http://www.w3.org/TR/html4/HTML4.decl might
do that) but I trust the authors of the XSLT 2.0/XQuery 1.0
serialization specification on that.



--

        Martin Honnen
        http://msmvps.com/blogs/martin_honnen/

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@...
https://lists.sourceforge.net/lists/listinfo/saxon-help 

Re: Illegal HTML decimal 149 and 150?

by Michael Kay :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

 

> > The problem occurs with •– . In Saxon 6 no error is
> > raised, but with Saxon 9 I get Error at xsl:text on line 496 of
> > Mk45_master.xsl:
> >    SERE0014: Illegal HTML character: decimal 149
> >
> > I'm curious why.
>
> See
> http://www.w3.org/TR/xslt-xquery-serialization/#HTML_CHARDATA,
>  it says:
>    "Certain characters, specifically the control characters
> #x7F-#x9F, are legal in XML but not in HTML. It is a
> serialization error [err:SERE0014] to use the HTML output
> method when such characters appear in the instance of the
> data model. The serializer MUST signal the error."
>

There was a lot of debate about this rule, which is indeed new in XSLT 2.0.
The debate involved a number of W3C working groups, so it got quite
political. If these characters are used in XML, they are nearly always used
wrongly: typically the code decimal 149 is being used to represent a bullet
mark (which is 149 in CP1252, but x2022 in Unicode). The decision was to
make these characters illegal when generating HTML, forcing users to correct
their data at source. One way of correcting it might simply be to change the
encoding declaration on the XML document to encoding="cp1252". Another fix
would be to use a character map in the stylesheet to substitute the
characters correctly.

I'm afraid I can't point to the chapter and verse that says these characters
are illegal in HTML either, but I'm sure it's right!

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay 

 


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@...
https://lists.sourceforge.net/lists/listinfo/saxon-help 

Re: Illegal HTML decimal 149 and 150?

by Dan Vint :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

At 02:10 PM 11/2/2009, you wrote:

>There was a lot of debate about this rule, which is indeed new in XSLT 2.0.
>The debate involved a number of W3C working groups, so it got quite
>political. If these characters are used in XML, they are nearly always used
>wrongly: typically the code decimal 149 is being used to represent a bullet
>mark (which is 149 in CP1252, but x2022 in Unicode). The decision was to
>make these characters illegal when generating HTML, forcing users to correct
>their data at source. One way of correcting it might simply be to change the
>encoding declaration on the XML document to encoding="cp1252". Another fix
>would be to use a character map in the stylesheet to substitute the
>characters correctly.

I figured it was due to the encoding, but I couldn't see why it
worked from one version but not the other. The encoding set in the
HTML output in both cases was UTF-8


>I'm afraid I can't point to the chapter and verse that says these characters
>are illegal in HTML either, but I'm sure it's right!

thanks

..dan


---------------------------------------------------------------------------
Danny Vint

Panoramic Photography
http://www.dvint.com

voice: 502-749-6179
     


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@...
https://lists.sourceforge.net/lists/listinfo/saxon-help 

Re: Illegal HTML decimal 149 and 150?

by Michael Kay :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>
> I figured it was due to the encoding, but I couldn't see why
> it worked from one version but not the other. The encoding
> set in the HTML output in both cases was UTF-8
>

Because this was a new rule introduced in XSLT 2.0. (Well, XSLT 1.0 was
unspecific, and a 1.0 processor could have imposed this rule, but the spec
didn't say it had to, and Saxon 6 didn't.)

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay 


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@...
https://lists.sourceforge.net/lists/listinfo/saxon-help 

Re: Illegal HTML decimal 149 and 150?

by Dan Vint :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sorry I understood it was a change in the XSLT spec, but I couldn't figure
it out from the standpoint of the HTML that was generated.

..dan

---------------------------------------------------------------------------
Danny Vint

Specializing in Panoramic Images of California and the West
http://www.dvint.com

Voice:510:522-4703
FAX: 801-749-3229

On Tue, 3 Nov 2009, Michael Kay wrote:

>>
>> I figured it was due to the encoding, but I couldn't see why
>> it worked from one version but not the other. The encoding
>> set in the HTML output in both cases was UTF-8
>>
>
> Because this was a new rule introduced in XSLT 2.0. (Well, XSLT 1.0 was
> unspecific, and a 1.0 processor could have imposed this rule, but the spec
> didn't say it had to, and Saxon 6 didn't.)
>
> Regards,
>
> Michael Kay
> http://www.saxonica.com/
> http://twitter.com/michaelhkay
>
>
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry(R) Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart your
> developing skills, take BlackBerry mobile applications to market and stay
> ahead of the curve. Join us from November 9 - 12, 2009. Register now!
> http://p.sf.net/sfu/devconference
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> saxon-help@...
> https://lists.sourceforge.net/lists/listinfo/saxon-help
>

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@...
https://lists.sourceforge.net/lists/listinfo/saxon-help