|
View:
New views
6 Messages
—
Rating Filter:
Alert me
|
|
|
Illegal HTML decimal 149 and 150?I've got a stylesheet that was developed and used under saxon 6.5 and set
to v1.1 of xslt. I decided to start running against Saxon 9.1. There were a bunch of issues that were caught/flagged when I made the switch and also set the xslt version to 2.0. Most of these I understood as issues with the XSLT syntax. I'm puzzeled by one of them. So in the original stylesheet I had these lines: <?xml version="1.0"?> <xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html" indent="yes"/> <xsl:template match="ipb"> <xsl:variable name="ipbcounter"> <xsl:number format="001" level="multiple" count="ipb"/> </xsl:variable> <xsl:variable name="ipb_num" select="concat('./Output/IPB/f21', $ipbcounter, '.html')"/> <xsl:document href="{$ipb_num}" method="html"> <html> <head> <title><xsl:value-of select="./table/title"/></title> <link rel="stylesheet" type="text/css" href="../StyleSheets/Mk45.css"/> </head> <body link="#0000ff"> <xsl:apply-templates select="./table"/> </body> </html> </xsl:document> </xsl:template> <xsl:template match="ipbillus"> <xsl:variable name="sht_num" select="./xref/@shtref"/> <xsl:variable name="board_num" select="unparsed-entity-uri(ancestor::ipb/figure/subfig/graphic[@boardno=$sht_num]/@boardno)"/> <xsl:variable name="fig_num" select="substring-after($board_num, 'Source/')"/> <a id="{@id}" href="javascript:OpenIPBFlash('{substring-before($fig_num, '.')}')"> <xsl:number format="1" level="any" count="ipbillus | ipbnonillus" from="ipb"/></a> </xsl:template> .... other stuff removed The problem occurs with . In Saxon 6 no error is raised, but with Saxon 9 I get Error at xsl:text on line 496 of Mk45_master.xsl: SERE0014: Illegal HTML character: decimal 149 I'm curious why. I thought it might be an encoding issue. In the generated HTML I get the entity reference and I see this is changed to nbsp for the 160 <td align="" valign="top" border="2"> <a id="idx18427" href="javascript:OpenIPBFlash('')">1</a> </td> I changed this string to be •– which is generating the UTF-8 codes (instead of entity references). All of this looks fine in the HTML files. But a second part tool used for full text indexing doesn't handle the entity reference the same way it does the UTF-8 values. Can you explain why the decimal 149 is illegal in this case? I tried searching around for an explanation of why it works in the HTML but is not allowed in XSLT and didn't find any leads. Ultimately my problem is with the full text engine, but I was surprised at the changed results. ..dan --------------------------------------------------------------------------- Danny Vint Specializing in Panoramic Images of California and the West http://www.dvint.com Voice:510:522-4703 FAX: 801-749-3229 ------------------------------------------------------------------------------ Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference _______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@... https://lists.sourceforge.net/lists/listinfo/saxon-help |
|
|
Re: Illegal HTML decimal 149 and 150?Dan Vint wrote:
> The problem occurs with . In Saxon 6 no error is raised, > but with Saxon 9 I get > Error at xsl:text on line 496 of Mk45_master.xsl: > SERE0014: Illegal HTML character: decimal 149 > > I'm curious why. See http://www.w3.org/TR/xslt-xquery-serialization/#HTML_CHARDATA, it says: "Certain characters, specifically the control characters #x7F-#x9F, are legal in XML but not in HTML. It is a serialization error [err:SERE0014] to use the HTML output method when such characters appear in the instance of the data model. The serializer MUST signal the error." Unicode character #149 is #x95 so it is control character not allowed in HTML and the XSLT 2.0/XQuery 1.0 serialization is stricter than XSLT 1.0 serialization it seems. I can't point you to the part of the HTML 4 specification excluding those characters (I suspect http://www.w3.org/TR/html4/HTML4.decl might do that) but I trust the authors of the XSLT 2.0/XQuery 1.0 serialization specification on that. -- Martin Honnen http://msmvps.com/blogs/martin_honnen/ ------------------------------------------------------------------------------ Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference _______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@... https://lists.sourceforge.net/lists/listinfo/saxon-help |
|
|
Re: Illegal HTML decimal 149 and 150?> > The problem occurs with . In Saxon 6 no error is > > raised, but with Saxon 9 I get Error at xsl:text on line 496 of > > Mk45_master.xsl: > > SERE0014: Illegal HTML character: decimal 149 > > > > I'm curious why. > > See > http://www.w3.org/TR/xslt-xquery-serialization/#HTML_CHARDATA, > it says: > "Certain characters, specifically the control characters > #x7F-#x9F, are legal in XML but not in HTML. It is a > serialization error [err:SERE0014] to use the HTML output > method when such characters appear in the instance of the > data model. The serializer MUST signal the error." > There was a lot of debate about this rule, which is indeed new in XSLT 2.0. The debate involved a number of W3C working groups, so it got quite political. If these characters are used in XML, they are nearly always used wrongly: typically the code decimal 149 is being used to represent a bullet mark (which is 149 in CP1252, but x2022 in Unicode). The decision was to make these characters illegal when generating HTML, forcing users to correct their data at source. One way of correcting it might simply be to change the encoding declaration on the XML document to encoding="cp1252". Another fix would be to use a character map in the stylesheet to substitute the characters correctly. I'm afraid I can't point to the chapter and verse that says these characters are illegal in HTML either, but I'm sure it's right! Regards, Michael Kay http://www.saxonica.com/ http://twitter.com/michaelhkay ------------------------------------------------------------------------------ Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference _______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@... https://lists.sourceforge.net/lists/listinfo/saxon-help |
|
|
Re: Illegal HTML decimal 149 and 150?At 02:10 PM 11/2/2009, you wrote:
>There was a lot of debate about this rule, which is indeed new in XSLT 2.0. >The debate involved a number of W3C working groups, so it got quite >political. If these characters are used in XML, they are nearly always used >wrongly: typically the code decimal 149 is being used to represent a bullet >mark (which is 149 in CP1252, but x2022 in Unicode). The decision was to >make these characters illegal when generating HTML, forcing users to correct >their data at source. One way of correcting it might simply be to change the >encoding declaration on the XML document to encoding="cp1252". Another fix >would be to use a character map in the stylesheet to substitute the >characters correctly. I figured it was due to the encoding, but I couldn't see why it worked from one version but not the other. The encoding set in the HTML output in both cases was UTF-8 >I'm afraid I can't point to the chapter and verse that says these characters >are illegal in HTML either, but I'm sure it's right! thanks ..dan --------------------------------------------------------------------------- Danny Vint Panoramic Photography http://www.dvint.com voice: 502-749-6179 ------------------------------------------------------------------------------ Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference _______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@... https://lists.sourceforge.net/lists/listinfo/saxon-help |
|
|
Re: Illegal HTML decimal 149 and 150?>
> I figured it was due to the encoding, but I couldn't see why > it worked from one version but not the other. The encoding > set in the HTML output in both cases was UTF-8 > Because this was a new rule introduced in XSLT 2.0. (Well, XSLT 1.0 was unspecific, and a 1.0 processor could have imposed this rule, but the spec didn't say it had to, and Saxon 6 didn't.) Regards, Michael Kay http://www.saxonica.com/ http://twitter.com/michaelhkay ------------------------------------------------------------------------------ Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference _______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@... https://lists.sourceforge.net/lists/listinfo/saxon-help |
|
|
Re: Illegal HTML decimal 149 and 150?Sorry I understood it was a change in the XSLT spec, but I couldn't figure
it out from the standpoint of the HTML that was generated. ..dan --------------------------------------------------------------------------- Danny Vint Specializing in Panoramic Images of California and the West http://www.dvint.com Voice:510:522-4703 FAX: 801-749-3229 On Tue, 3 Nov 2009, Michael Kay wrote: >> >> I figured it was due to the encoding, but I couldn't see why >> it worked from one version but not the other. The encoding >> set in the HTML output in both cases was UTF-8 >> > > Because this was a new rule introduced in XSLT 2.0. (Well, XSLT 1.0 was > unspecific, and a 1.0 processor could have imposed this rule, but the spec > didn't say it had to, and Saxon 6 didn't.) > > Regards, > > Michael Kay > http://www.saxonica.com/ > http://twitter.com/michaelhkay > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > _______________________________________________ > saxon-help mailing list archived at http://saxon.markmail.org/ > saxon-help@... > https://lists.sourceforge.net/lists/listinfo/saxon-help > ------------------------------------------------------------------------------ Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference _______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@... https://lists.sourceforge.net/lists/listinfo/saxon-help |
| Free embeddable forum powered by Nabble | Forum Help |