Resolving DTD system URI with XMLCatalogResolver

View: New views
3 Messages — Rating Filter:   Alert me  

Resolving DTD system URI with XMLCatalogResolver

by cbowditch :: Rate this Message:

| View Threaded | Show Only this Message

I read the Xerces-J website help on setting up the XMLCatalogResolver but can't seem to get it working. I need to use it when parsing HTML with the standard DTD declaration:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

w3.org have recently blocked access to all DTDs on their site, so if I can't get this working then I can't use Xerces to parse HTML!

Here is my test code:

<code>
            XMLCatalogResolver resolver = new XMLCatalogResolver();
            resolver.setCatalogList(new String[] {"c:\\data\\4.0-patch\\Java\\Server\\lib_doNotDistribute\\thcatalog.cat"});
            parser.getXMLReader().setProperty("http://apache.org/xml/properties/internal/entity-resolver", resolver);
            parser.getXMLReader().setContentHandler(new TestResolver());
            parser.getXMLReader().parse(new InputSource(fis));
</code>

I've done a little debugging in the xerces code and the resolver gives up because the namespace is null, see the following snippet from the code of XMLCatalogResolver:

<code>
        // The namespace is useful for resolving namespace aware
        // grammars such as XML schema. Let it take precedence over
        // the external identifier if one exists.
        String namespace = resourceIdentifier.getNamespace();
        if (namespace != null) {
            resolvedId = resolveURI(namespace);
        }
</code>

I'm guessing that the namespace is something that really only applies to XSD references and not DTDs or did I misunderstand. Does anyone else have this working? My catalog file matches the many examples available on the web but I include here for completeness:

<?xml version="1.0"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">

  <public publicId="-//W3C//DTD XHTML 1.0 Transitional//EN"
          uri="dtd/xhtml1-transitional.dtd"/>

</catalog>


Thanks,

Chris

Re: Resolving DTD system URI with XMLCatalogResolver

by Michael Glavassevich-3 :: Rate this Message:

| View Threaded | Show Only this Message

Hi Chris,

cbowditch <bowditch_chris@...> wrote on 06/11/2009 04:23:01 PM:

> I read the Xerces-J website help on setting up the XMLCatalogResolver but
> can't seem to get it working. I need to use it when parsing HTML with the
> standard DTD declaration:
>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
>
> w3.org have recently blocked access to all DTDs on their site, so if I can't
> get this working then I can't use Xerces to parse HTML!
>
> Here is my test code:
>
> <code>
>             XMLCatalogResolver resolver = new XMLCatalogResolver();
>             resolver.setCatalogList(new String[]
> {"c:\\data\\4.0-patch\\Java\\Server\\lib_doNotDistribute\\thcatalog.cat"});
>            
> parser.getXMLReader().setProperty("http://apache.
> org/xml/properties/internal/entity-resolver",
> resolver);
>             parser.getXMLReader().setContentHandler(new TestResolver());
>             parser.getXMLReader().parse(new InputSource(fis));
> </code>
>
> I've done a little debugging in the xerces code and the resolver gives up
> because the namespace is null, see the following snippet from the code of
> XMLCatalogResolver:
>
> <code>
>         // The namespace is useful for resolving namespace aware
>         // grammars such as XML schema. Let it take precedence over
>         // the external identifier if one exists.
>         String namespace = resourceIdentifier.getNamespace();
>         if (namespace != null) {
>             resolvedId = resolveURI(namespace);
>         }
> </code>

It doesn't give up. There is more code after that which tries to resolve the publicId / systemId.

> I'm guessing that the namespace is something that really only applies to XSD
> references and not DTDs or did I misunderstand.


Right. That section of the code only applies to XSDs.

> Does anyone else have this working? My catalog file matches the many examples
> available on the web but I include here for completeness:
>
> <?xml version="1.0"?>
> <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
>
>   <public publicId="-//W3C//DTD XHTML 1.0 Strict//EN"
>           uri="dtd/xhtml1-transitional.dtd"/>
>
> </catalog>


Perhaps it has something to do with the publicId in the catalog not matching what you have in your document. "-//W3C//DTD XHTML 1.0 Transitional//EN" vs. "-//W3C//DTD XHTML 1.0 Strict//EN"?

> Thanks,
>
> Chris
> --
> View this message in context: http://www.nabble.com/Resolving-DTD-
> system-URI-with-XMLCatalogResolver-tp23988310p23988310.html
> Sent from the Xerces - J - Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@...
> For additional commands, e-mail: j-users-help@...


Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@...

E-mail: mrglavas@...


Re: Resolving DTD system URI with XMLCatalogResolver

by cbowditch :: Rate this Message:

| View Threaded | Show Only this Message

Hi Michael,

Michael Glavassevich-3 wrote:
Hi Chris,

cbowditch <bowditch_chris@hotmail.com> wrote on 06/11/2009 04:23:01 PM:

<snip/>

> <code>
>         // The namespace is useful for resolving namespace aware
>         // grammars such as XML schema. Let it take precedence over
>         // the external identifier if one exists.
>         String namespace = resourceIdentifier.getNamespace();
>         if (namespace != null) {
>             resolvedId = resolveURI(namespace);
>         }
> </code>

MG> It doesn't give up. There is more code after that which tries to resolve
the publicId / systemId.

CB> Yes you are right. I don't know how I missed it (my only excuse is its late here in the UK!)

> I'm guessing that the namespace is something that really only applies to
XSD
> references and not DTDs or did I misunderstand.

MG> Right. That section of the code only applies to XSDs.

> Does anyone else have this working? My catalog file matches the many
examples
> available on the web but I include here for completeness:
>
> <?xml version="1.0"?>
> <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
>
>   <public publicId="-//W3C//DTD XHTML 1.0 Strict//EN"
>           uri="dtd/xhtml1-transitional.dtd"/>
>
> </catalog>

MG> Perhaps it has something to do with the publicId in the catalog not
matching what you have in your document. "-//W3C//DTD XHTML 1.0
Transitional//EN" vs. "-//W3C//DTD XHTML 1.0 Strict//EN"?

CB> That was a mistake in my original post. I did try to amend the nabble entry. Anyway it is correct on my file system.

CB> The actual problem is equally as silly. The path to the catalog was incorrect. Grrr.

CB> Now I just have to get the JAXP Catalog Resolver working.

Thanks for your help,

Chris

> Thanks,
>
> Chris
> --
> View this message in context: http://www.nabble.com/Resolving-DTD-
> system-URI-with-XMLCatalogResolver-tp23988310p23988310.html
> Sent from the Xerces - J - Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org