Hi All,
According to W3C's
Excessive DTD Traffic we should not
download any DTD, because "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
denotes a
namespace, not a ressource allthough it looks
and works like an URI.
> A while ago we put a system in place to monitor our
servers for abusive request patterns
> and send 503
Service Unavailable responses with custom text depending
> on the nature of the abuse. Our hope was that the
authors of misbehaving software and
> the administrators of sites who deployed it would
notice these errors and make the
> necessary fixes to the software responsible.
>> To read the DTD, one might be able
to use an alternate URL based on the public identifier. Unfortunately,
catalogs are not in wide-spread use, and W3C does nothing to promote them.
--
Best regards,
Marcel Schnippe
Changemanager PER
Provinzial Rheinland
Die Versicherung der Sparkassen
40195 Düsseldorf
Telefon: 0211/978-1378
Fax: 0211/978-41378
Provinzial Rheinland Versicherung
AG – Die Versicherung der Sparkassen; Amtsgericht Düsseldorf HRB 41241;
Provinzial Rheinland Lebensversicherung AG – Die Versicherung der Sparkassen;
Amtsgericht Düsseldorf HRB 41741;
Sitz der Gesellschaften: Provinzialplatz 1, D-40591 Düsseldorf;
Vorsitzender der Aufsichtsräte: Harry K. Voigtsberger;
Vorstände: Ulrich Jansen, Vorsitzender; Michael Bock, Patric Fedlmeier,
Dieter Kurka, Peter Slawik, Dr. Hans Peter Sterk
On Fri, Jun 12, 2009 at 15:12, Andrzej Bialecki <ab@...>
wrote:
Doğacan Güney wrote:
Hi all,
Does anyone know why TestNodeWalker keeps failing
for the last couple of days?
I can reproduce the error in my computer; test log looks like
this:
Testsuite: org.apache.nutch.util.TestNodeWalker
Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 1.101 sec
------------- Standard Error -----------------
java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1241)
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown
Source)
at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown
Source)
at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown
Source)
at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown
Source)
at org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown
Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.nutch.util.TestNodeWalker.testSkipChildren(TestNodeWalker.java:63)
Hmm, error 503 is "Service unavailable". Either
this is a genuine problem at www.w3.org,
or the access to this site is not available from the machine that runs
tests. I believe we should do something similar as we did for generating
the web docs, i.e. use our own catalog or DTDs instead of downloading DTDs
from the net.
DTD is defined like this (in file TestNodeWalker.java)
private final static String WEBPAGE=
"<!DOCTYPE html PUBLIC \"-//W3C//DTD
XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">"
// ... rest of the webpage
How can we move that DTD to local? Perhaps, we should
just remove
that line, I don't know if it does anything there.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com
Contact: info at sigram dot com
--
Doğacan Güney