Cannot close an XML file used for parsing

View: New views
2 Messages — Rating Filter:   Alert me  

Cannot close an XML file used for parsing

by Jack Bush :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
Hi All,

I appears to have difficulty closing (possibly flushing it first) an XML file that was subsequently being parsed without success. The error generated is:

org.jdom.input.JDOMParseException: Error on line 23: The element type "form" must be terminated by the matching end-tag "</form>".

Below is the code snippets of readData() to retrieve (HTML) data from a website, save it to a file, then convert to XML format before returning the new filename:
public String readData() {
 
    try {
          URL url  = new URL("http://www.abc.com");
          URLConnection connection = url.openConnection();      
          InputStream isInHtml = url.openStream();   // throws an IOException    
          disInHtml = new DataInputStream(new BufferedInputStream(isInHtml));         
          System.out.flush();
          FileOutputStream fosOutHtml = null;
          fosOutHtml = new FileOutputStream("C:\\Temp\\ABC.html");
          int oneChar, count=0;
          while ((oneChar=disInHtml.read()) != -1)
              fosOutHtml.write(oneChar);
      }
      catch { ... }
      finally {
          isInHtml.close();
          disInHtml.close();
          fosOutHtml.flush(); // optional
          fosOutHtml.close();
      }
    }
 
    try {
          File fileInHtml = new File("C:\\Temp\\ABC.html");
          FileReader frInHtml = new FileReader(fileInHtml);
          BufferedReader brInHtml = new BufferedReader(frInHtml);
          String string = "";
          while (brInHtml.ready())
              string += brInHtml.readLine() + "\n";
          fwOutXml  = new FileWriter("C:\\Temp\\ABC.xml");
          pwOutXml  = new PrintWriter(fwOutXml);
          light_html2xml html2xml = new light_html2xml();
          pwOutXml.print(html2xml.Html2Xml(string));
    }
    catch { ... }
    finally {
        fwOutXml.close();
        pwOutXml.close();
    }
    return fileInHtml.getAbsolutePath();
}
 
// parseData reads the XML file using the name returned by readData()
public void parseData(String XMLFilename)
{
    try
    {
        FileReader frInXml = new FileReader(FileName);
        BufferedReader brInXml = new BufferedReader(frInXml);
        SAXBuilder saxBuilder = new SAXBuilder("org.apache.xerces.parsers.SAXParser"); // JDOMParseException generated.
        ....
}
These codes would worked when they were in a single method but I have since placed some structure around them using a number methods.

This issue has risen in th past where I have been able to close the XML file prior to reading them again. However, I don't have a solution for it this time round.

I am running JDK 1.6.0_10, Netbeans 6.1, JDOM 1.1 on Windows XP platform.

Any assistance would be appreciated.

Many thanks,

Jack


Search 1000's of available singles in your area at the new Yahoo!7 Dating. Get Started.
_______________________________________________
To control your jdom-interest membership:
http://www.jdom.org/mailman/options/jdom-interest/youraddr@...

Re: Cannot close an XML file used for parsing

by Tatu Saloranta :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

--- On Tue, 10/28/08, Jack Bush <netbeansfan@...> wrote:

> From: Jack Bush <netbeansfan@...>
> Subject: [jdom-interest] Cannot close an XML file used for parsing
> To: jdom-interest@...
> Date: Tuesday, October 28, 2008, 7:03 AM
> Hi All,
>
> I appears to have difficulty closing (possibly flushing it
> first) an XML file that was subsequently being parsed
> without success. The error generated is:
>
> org.jdom.input.JDOMParseException: Error on line 23: The
> element type "form" must be terminated by the
> matching end-tag "</form>".
>
> Below is the code snippets of readData() to retrieve (HTML)
> data from a website, save it to a file, then convert to XML
> format before returning the new filename:
...

But xml parsers do not convert html -- either content is well-formed xml, or it is not. Based on error message it looks like it is not (in html you can omit all kinds of things without problems, not so in xml).

If you need to process html what you need to do is to use an html parser that can expose content as if it was xml. My favorite is TagSoup but there are many other alternatives like JTidy and Neko.
After this step you can use JDOM for building tree model to process content.

Hope this helps,

-+ Tatu +-



     
_______________________________________________
To control your jdom-interest membership:
http://www.jdom.org/mailman/options/jdom-interest/youraddr@...