[tagsoup-friends] How to parse XML document with default namespace with JDOM XPath

View: New views
1 Messages — Rating Filter:   Alert me  

[tagsoup-friends] How to parse XML document with default namespace with JDOM XPath

by Jack Bush :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.

Hi All,

 

I am having difficulty parsing using Saxon and TagSoup parser on a namespace html document. The relevant content of this document are as follows:

 

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional/ /EN" "http://www. w3.org/TR/ xhtml1/DTD/ xhtml1-transitio nal.dtd">

<html xmlns="http: //www.w3. org/1999/ xhtml">

<head>

<meta http-equiv=" Content-Type" content="text/ html; charset=UTF- 8" />

……..

</head>

<body>

    <div id="container">

        <div id="content">

            <table class="sresults">

                <tr>

                    <td>

                        <a href="http:/ /www.abc. com/areas" title=" Hollywood , CA "> hollywood </a>

                    </td>

                    <td>

                        <a href="http:/ /www.abc. com/areas" title=" San Jose , CA "> san jose </a>

                    </td>

                    <td>

                        <a href="http:/ /www.abc. com/areas" title=" San Francisco , CA "> san francisco </a>

                    </td>

                    <td>

                        <a href="http:/ /www.abc. com/areas" title=" San Diego , CA "> San diego </a>

                    </td>

              </tr>

……….

</body>

</html>

 

Below is the relevant code snippets illustrates how I have attempted to retrieve the contents (value of  <a>):

 

             import java.util.*;

             import org.jdom.*;

             import org.jdom.xpath. *;

             import org.saxpath. *;

             import org.ccil.cowan. tagsoup.Parser;

 

( 1 )       frInHtml = new FileReader(" C:\\Tmp\\ ABC.html" );

( 2 )       brInHtml = new BufferedReader( frInHtml) ;

( 3 ) //    SAXBuilder saxBuilder = new SAXBuilder(" org.apache. xerces.parsers. SAXParser" );

( 4 )       SAXBuilder saxBuilder = new SAXBuilder(" org.ccil. cowan.tagsoup. Parser");

( 5 )       org.jdom.Document jdomDocument = saxbuilder.build( brInHtml) ;

( 6 )       XPath xpath =  XPath.newInstance( "/ns:html/ ns:body/ns: div[@id=' container' ]/ns:div[ @id='content' ]/ns:table[ @class='sresults ']/ns:tr/ ns:td/ns: a");

( 7 )       xpath.addNamespace( "ns", "http://www. w3.org/1999/ xhtml");

( 8 )       java.util.List list = (java.util.List) (xpath.selectNodes( jdomDocument) );

( 9 )       Iterator iterator = list.iterator( );

( 10 )     while (iterator.hasNext( ))

( 11 )     {

( 12 )            Object object = iterator.next( );

( 13 ) //         if (object instanceof Element)

( 14 ) //               System.out.println( ((Element) object).getTextN ormalize( ));

( 15 )             if (object instanceof Content)

( 16 )                   System.out.println( ((Content) object).getValue ());

              }

….

 

This program would work on the same document without the default namespace, hence, it would not be necessary to include “ns” prefix along in the XPath statements (line 6-7) either. Moreover, I was using “org..apache. xerces.parsers. SAXParser” to have successfully retrieve content of <a> from the same document without default namespace in the past.

 

I would like to achieve the following objectives if possible:

 

( i ) Exclude DTD and namespace in order to simplifying the parsing process. How this could be done?

( ii ) If this is not possible, how to include it in XPath statements (line 6-7) so that the value of <a> is picked up correctly?

( iii ) Would changing from “org.apache.xerces. parsers.SAXParse r” to “org.ccil.cowan. tagsoup.Parser” make any difference as far as using XPath is concerned?

( iv ) Failing to exlude DTD, how to change the lookup of a PUBLIC DTD to a local SYSTEM one and include a local DTD for reference?

 

I am running JDK 1.6.0_06, Netbeans 6.1, JDOM 1.1, Saxon6-5-5, Tagsoup 1.2 on Windows XP platform.

 

Any assistance would be appreciated.

 

Thanks in advance,

 

Jack



Search 1000's of available singles in your area at the new Yahoo!7 Dating. Get Started.
__._,_..___
To unsubscribe, send a blank email to tagsoup-friends-unsubscribe@...
Recent Activity
Visit Your Group
Give Back

Yahoo! for Good

Get inspired

by a good cause.

Y! Toolbar

Get it Free!

easy 1-click access

to your groups.

Yahoo! Groups

Start a group

in 3 easy steps.

Connect with others.

.

__,_._,___


Search 1000's of available singles in your area at the new Yahoo!7 Dating. Get Started.
_______________________________________________
To control your jdom-interest membership:
http://www.jdom.org/mailman/options/jdom-interest/youraddr@...