|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
Why does TestNodeWalker keep failing?Hi all,
Does anyone know why TestNodeWalker keeps failing for the last couple of days? I can reproduce the error in my computer; test log looks like this: Testsuite: org.apache.nutch.util.TestNodeWalker Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 1.101 sec ------------- Standard Error ----------------- java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1241) at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source) at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source) at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown Source) at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown Source) at org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.nutch.util.TestNodeWalker.testSkipChildren(TestNodeWalker.java:63) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:154) at junit.framework.TestCase.runBare(TestCase.java:127) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:118) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:421) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:912) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:766) ------------- ---------------- --------------- Testcase: testSkipChildren took 1.095 sec FAILED UL Content can NOT be found in the node junit.framework.AssertionFailedError: UL Content can NOT be found in the node at org.apache.nutch.util.TestNodeWalker.testSkipChildren(TestNodeWalker.java:79) I have no idea why we get a 503 there? -- Doğacan Güney |
|||||||||||
|
|
Re: Why does TestNodeWalker keep failing?Doğacan Güney wrote:
> Hi all, > > Does anyone know why TestNodeWalker keeps failing > for the last couple of days? > > I can reproduce the error in my computer; test log looks like > this: > > Testsuite: org.apache.nutch.util.TestNodeWalker > Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 1.101 sec > ------------- Standard Error ----------------- > java.io.IOException: Server returned HTTP response code: 503 for URL: > http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1241) > at > org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source) > at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source) > at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown > Source) > at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown > Source) > at > org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown > Source) > at > org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown > Source) > at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) > at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) > at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) > at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) > at > org.apache.nutch.util.TestNodeWalker.testSkipChildren(TestNodeWalker.java:63) Hmm, error 503 is "Service unavailable". Either this is a genuine problem at www.w3.org, or the access to this site is not available from the machine that runs tests. I believe we should do something similar as we did for generating the web docs, i.e. use our own catalog or DTDs instead of downloading DTDs from the net. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com |
|||||||||||
|
|
Re: Why does TestNodeWalker keep failing?On Fri, Jun 12, 2009 at 15:12, Andrzej Bialecki <ab@...> wrote:
DTD is defined like this (in file TestNodeWalker.java) private final static String WEBPAGE= "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">"
// ... rest of the webpage How can we move that DTD to local? Perhaps, we should just remove that line, I don't know if it does anything there.
-- Doğacan Güney |
|||||||||||
|
|
Antwort: Re: Why does TestNodeWalker keep failing?Hi All, According to W3C's Excessive DTD Traffic we should not download any DTD, because "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" denotes a namespace, not a ressource allthough it looks and works like an URI. > A while ago we put a system in place to monitor our servers for abusive request patterns > and send 503 Service Unavailable responses with custom text depending > on the nature of the abuse. Our hope was that the authors of misbehaving software and > the administrators of sites who deployed it would notice these errors and make the > necessary fixes to the software responsible. >> To read the DTD, one might be able to use an alternate URL based on the public identifier. Unfortunately, catalogs are not in wide-spread use, and W3C does nothing to promote them. -- Best regards, Marcel Schnippe Changemanager PER Provinzial Rheinland Die Versicherung der Sparkassen 40195 Düsseldorf Telefon: 0211/978-1378 Fax: 0211/978-41378 Provinzial Rheinland Versicherung
AG – Die Versicherung der Sparkassen; Amtsgericht Düsseldorf HRB 41241;
On Fri, Jun 12, 2009 at 15:12, Andrzej Bialecki <ab@...> wrote: Doğacan Güney wrote: Hi all, Does anyone know why TestNodeWalker keeps failing for the last couple of days? I can reproduce the error in my computer; test log looks like this: Testsuite: org.apache.nutch.util.TestNodeWalker Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 1.101 sec ------------- Standard Error ----------------- java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1241) at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source) at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source) at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown Source) at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown Source) at org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.nutch.util.TestNodeWalker.testSkipChildren(TestNodeWalker.java:63) Hmm, error 503 is "Service unavailable". Either this is a genuine problem at www.w3.org, or the access to this site is not available from the machine that runs tests. I believe we should do something similar as we did for generating the web docs, i.e. use our own catalog or DTDs instead of downloading DTDs from the net. DTD is defined like this (in file TestNodeWalker.java) private final static String WEBPAGE= "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">" // ... rest of the webpage How can we move that DTD to local? Perhaps, we should just remove that line, I don't know if it does anything there. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com -- Doğacan Güney |
|||||||||||
|
|
Re: Antwort: Re: Why does TestNodeWalker keep failing?marcel.schnippe@... wrote:
> > Hi All, > > According to W3C's Excessive DTD Traffic > <http://www.w3.org/2005/06/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic>we > should not download any DTD, because > "_http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd_" denotes a > namespace, not a ressource allthough it looks and works like an URI. > > > A while ago we put a system in place to monitor our servers for > abusive request patterns > > and send 503 Service Unavailable responses with custom text depending > > on the nature of the abuse. Our hope was that the authors of > misbehaving software and > > the administrators of sites who deployed it would notice these errors > and make the > > necessary fixes to the software responsible. > > >> To read the DTD, one might be able to use an alternate URL based on > the public identifier. Unfortunately, catalogs are not in wide-spread > use, and W3C does nothing to promote them. Thanks Marcel, this confirms my suspicion. The proper fix is to use a local copy of DTDs, and set an XMLCatalogResolver on every XML parser to access these local copies. An interim workaround for TestNodeWalker is to turn off validation and turn off loading of external entities - I verified that the test passes then. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com |
| Free embeddable forum powered by Nabble | Forum Help |