XHTML via Tidy not making it into XSLT

View: New views
2 Messages — Rating Filter:   Alert me  

XHTML via Tidy not making it into XSLT

by Peter Flynn-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I have a resource in my sitemap which makes a web page available as XHTML:

> <map:match pattern="fetch/**">
>   <map:generate src="http://{1}" type="html"/>
>   <map:transform src="xsl/as-is.xsl"/>
>   <map:serialize type="xhtml"/>
> </map:match>

I call this from within another XSLT file so that I can screenscrape the
document for a specific element type by ensuring that it is Tidy'd to
XHTML first. The as-is.xsl is a plain identity transform to match "*".
Ugly, but useful (there must be a more elegant way but I haven't found
it). In the second XSLT file I have a match for an element type which
holds the desired URI in an attribute:

> <xsl:apply-templates
>      select="document(concat('http://myserver/fetch/',@site))//
>              descendant::html:div[@class='foo']"/>

Constructing the URI and issuing it by hand from the terminal with curl,
wget, dog, etc works fine, and the resulting XHTML file works (tested
with lxgrep to ensure that the XPath extracts the right element), so I
know that bit works.

When accessed from within the second stylesheet, the cocoon.log shows
Tidy successfully converting the remote page to XHTML, the same as when
tested from the terminal, but the data never makes it through to the
template for html:div (the namespace *is* specified in the stylesheet
:-) In cocoon.log there's a warning:

> WARN  (2009-10-23) 11:34.02:162 [sitemap.transformer.xslt] (/doc/test) TP-Processor9/TraxErrorListener: file:///xsl/tools.xsl:7:138

but it doesn't say what it found wrong (not very helpful). Line 7 of
tools.xsl is the apply-templates shown above, char 138 is the end of
that line.

Testing it from the command line with Saxon, I get this:

> Recoverable error on line 7 of file:/xsl/tools.xsl:
>   FODC0005: java.io.IOException: Server returned HTTP response code: 503 for URL:
>   http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd

503 is a temporary overload, but that URI is retrievable with curl the
instant before and after using Saxon. And in any case, when going via
Cocoon it would cache the DTD (wouldn't it? to avoid overloading the W3C
with a gazillion requests for the DTD URI?)

I'm missing a trick here, but I can't see what.

///Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: XHTML via Tidy not making it into XSLT

by Alexander Daniel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 23.10.2009, at 13:26, Peter Flynn wrote:

> Testing it from the command line with Saxon, I get this:
>
>> Recoverable error on line 7 of file:/xsl/tools.xsl:
>>  FODC0005: java.io.IOException: Server returned HTTP response code:  
>> 503 for URL:
>>  http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
>
> 503 is a temporary overload, but that URI is retrievable with curl  
> the instant before and after using Saxon. And in any case, when  
> going via Cocoon it would cache the DTD (wouldn't it? to avoid  
> overloading the W3C with a gazillion requests for the DTD URI?)

Cocoon uses the entity resolution via catalogs for resolving DTDs.  
"xhtml1-strict.dtd" is defined in [1] as PUBLIC via "-//W3C//DTD XHTML  
1.0 Strict//EN". "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"  
is the SYSTEM one. I summarized some info about the entity resolving  
in Cocoon 2.2 in [2].

I also experienced that w3.org URLs are sometimes retrievable and  
sometimes not, i.e. you can't count on them.

Alex

[1] http://svn.apache.org/repos/asf/cocoon/tags/cocoon-2.2/cocoon-xml-resolver/cocoon-xml-resolver-1.0.0/src/main/resources/META-INF/cocoon/entities/w3c/catalog
[2] http://markmail.org/message/ueyhyzki2g7mzaa5

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...