|
View:
New views
2 Messages
—
Rating Filter:
Alert me
|
|
|
XHTML via Tidy not making it into XSLTI have a resource in my sitemap which makes a web page available as XHTML:
> <map:match pattern="fetch/**"> > <map:generate src="http://{1}" type="html"/> > <map:transform src="xsl/as-is.xsl"/> > <map:serialize type="xhtml"/> > </map:match> I call this from within another XSLT file so that I can screenscrape the document for a specific element type by ensuring that it is Tidy'd to XHTML first. The as-is.xsl is a plain identity transform to match "*". Ugly, but useful (there must be a more elegant way but I haven't found it). In the second XSLT file I have a match for an element type which holds the desired URI in an attribute: > <xsl:apply-templates > select="document(concat('http://myserver/fetch/',@site))// > descendant::html:div[@class='foo']"/> Constructing the URI and issuing it by hand from the terminal with curl, wget, dog, etc works fine, and the resulting XHTML file works (tested with lxgrep to ensure that the XPath extracts the right element), so I know that bit works. When accessed from within the second stylesheet, the cocoon.log shows Tidy successfully converting the remote page to XHTML, the same as when tested from the terminal, but the data never makes it through to the template for html:div (the namespace *is* specified in the stylesheet :-) In cocoon.log there's a warning: > WARN (2009-10-23) 11:34.02:162 [sitemap.transformer.xslt] (/doc/test) TP-Processor9/TraxErrorListener: file:///xsl/tools.xsl:7:138 but it doesn't say what it found wrong (not very helpful). Line 7 of tools.xsl is the apply-templates shown above, char 138 is the end of that line. Testing it from the command line with Saxon, I get this: > Recoverable error on line 7 of file:/xsl/tools.xsl: > FODC0005: java.io.IOException: Server returned HTTP response code: 503 for URL: > http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd 503 is a temporary overload, but that URI is retrievable with curl the instant before and after using Saxon. And in any case, when going via Cocoon it would cache the DTD (wouldn't it? to avoid overloading the W3C with a gazillion requests for the DTD URI?) I'm missing a trick here, but I can't see what. ///Peter --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: XHTML via Tidy not making it into XSLTOn 23.10.2009, at 13:26, Peter Flynn wrote:
> Testing it from the command line with Saxon, I get this: > >> Recoverable error on line 7 of file:/xsl/tools.xsl: >> FODC0005: java.io.IOException: Server returned HTTP response code: >> 503 for URL: >> http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd > > 503 is a temporary overload, but that URI is retrievable with curl > the instant before and after using Saxon. And in any case, when > going via Cocoon it would cache the DTD (wouldn't it? to avoid > overloading the W3C with a gazillion requests for the DTD URI?) Cocoon uses the entity resolution via catalogs for resolving DTDs. "xhtml1-strict.dtd" is defined in [1] as PUBLIC via "-//W3C//DTD XHTML 1.0 Strict//EN". "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" is the SYSTEM one. I summarized some info about the entity resolving in Cocoon 2.2 in [2]. I also experienced that w3.org URLs are sometimes retrievable and sometimes not, i.e. you can't count on them. Alex [1] http://svn.apache.org/repos/asf/cocoon/tags/cocoon-2.2/cocoon-xml-resolver/cocoon-xml-resolver-1.0.0/src/main/resources/META-INF/cocoon/entities/w3c/catalog [2] http://markmail.org/message/ueyhyzki2g7mzaa5 --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
| Free embeddable forum powered by Nabble | Forum Help |