|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
Converting elements from a SAX stream to JDOM elementsHello,
I have a long (infinite) XML stream, which I intend to parse with SAX. Each individual element in the stream is small, and should be parsed with JDOM: <stream> <element>...</element> <element>...</element> </stream> So each <element> (and their children) are parsed with JDOM, but the <stream> as a whole is parsed with SAX. It would be preferable if each <element> does not have to be serialized to an encoded string, and if elements are not processed twice (e.g., using SAX to echo the <element>'s XML to a stream, which is then read by JDOM). I've found several references to this problem from the past, but could not find a complete solution. My initial approach was to use the SAXHandler, like so: jdomHandler = new SAXHandler() { public void endElement(String uri, String localName, String qName) { if (qName.equals("an element which I want JDOM to parse")) { // change the SAX handler to myHandler } else { super.endElement(uri, localName, qName); } } }; myHandler = new DefaultHandler() { public void startElement(String uri, String localName, String qName, Attributes attributes) { if (qName.equals("an element which I want JDOM to parse")) { // change the SAX handler to jdomHandler } } }; (Ignoring for now that the endElement() method needs to keep track of its nesting level) However, trivially doing the above does not work, and fails after calling jdomHandler.getDocument(): Exception in thread "main" java.lang.IllegalStateException: Root element not set at org.jdom.Document.getRootElement(Document.java:218) I've looked at the initalization code that JDOM is normally doing in the SAXBuilder.build() method, and am reluctant to copy/modify the code, because I suspect it will break with future releases, and can't help but wonder if it would be over-complicating things. Is there a Right Way(TM) to do this? If so, I might also suggest that it's referenced from the FAQ. Many thanks, Colin _______________________________________________ To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr@... |
|
|
Re: Converting elements from a SAX stream to JDOM elementsColin.
My instinct would be to investigate a different approach... Perhaps a mechanism similar to how ZipInputStreams work, where the stream can be read as separate streams for each 'element'. Build a 'tee' or 'branched' custom InputStream between the main sax parser, and the underlying 'infinite' stream. This intermediate stream can be used to feed 'child' streams to the JDOM's sax parser, but use a standard sax parser to terminate the child stream using a mechanism similar to what you described below. This way you have just one 'infinite' stream, and you feed the contents to one 'global' parser which implements 'break logic' on a seperate version of the stream which feeds JDOM. When the end of the element is encountered in the main stream it causes the JDOM stream to reach 'end of file', and the JDOM side of things can then open a new 'child' stream for the next 'document'. No (little) memory overhead. No need to buffer complete documents, etc. InputStreams are relatively simple to implement ;-) Rolf Colin Horne wrote: > Hello, > > I have a long (infinite) XML stream, which I intend to parse with SAX. > Each individual element in the stream is small, and should be parsed > with JDOM: > > <stream> > <element>...</element> > <element>...</element> > </stream> > > So each <element> (and their children) are parsed with JDOM, but the > <stream> as a whole is parsed with SAX. It would be preferable if each > <element> does not have to be serialized to an encoded string, and if > elements are not processed twice (e.g., using SAX to echo the > <element>'s XML to a stream, which is then read by JDOM). > > I've found several references to this problem from the past, but could > not find a complete solution. > > My initial approach was to use the SAXHandler, like so: > > jdomHandler = new SAXHandler() { > public void endElement(String uri, String localName, String qName) { > if (qName.equals("an element which I want JDOM to parse")) { > // change the SAX handler to myHandler > } else { > super.endElement(uri, localName, qName); > } > } > }; > > > myHandler = new DefaultHandler() { > public void startElement(String uri, String localName, > String qName, Attributes attributes) { > if (qName.equals("an element which I want JDOM to parse")) { > // change the SAX handler to jdomHandler > } > } > }; > > > (Ignoring for now that the endElement() method needs to keep track of > its nesting level) > > However, trivially doing the above does not work, and fails after > calling jdomHandler.getDocument(): > > Exception in thread "main" java.lang.IllegalStateException: Root element not set > at org.jdom.Document.getRootElement(Document.java:218) > > > I've looked at the initalization code that JDOM is normally doing in > the SAXBuilder.build() method, and am reluctant to copy/modify the > code, because I suspect it will break with future releases, and can't > help but wonder if it would be over-complicating things. > > Is there a Right Way(TM) to do this? If so, I might also suggest that > it's referenced from the FAQ. > > Many thanks, > Colin > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr@... > > _______________________________________________ To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr@... |
|
|
Re: Converting elements from a SAX stream to JDOM elementsHello,
Thanks for the feedback, Rolf. I'm afraid I've yet to try your suggestion. Just before you replied, I tried solving the problem using a different approach. I wrote a dynamic proxy class which implemented DefaultHandler, and forwarded all messages to another DefaultHandler-extending class, which had an isInterested()/boolean method. The proxy checks the isInterested() method after entering an element, and if it returns true, it replays all the methods already sent (since the startElement() method), and then forwards all future methods to a new SAXHandler, until it reaches the end of the element. It then forwards the resulting Document to an interested class. Unfortunately, this didn't work. I've never implemented a dynamic proxy in Java before, and it seems that they can only implement interfaces, not classes. Since DefaultHandler is a class, Java refused to typecast my proxy. I tried various methods such as creating my own DefaultHandlerInterface (which extends the same interfaces as DefaultHandler, but is an interface), overriding the appropriate methods to support it. I'm afraid that in the end, I gave up on this approach. If I understand your suggestion correctly, then I have a couple of worries before implementing it. My understanding of your suggestion is to synchronise startElement()/endElement() with the InputStream (i.e., when start element is called, to call another method (on a class which extends InputStream), startDuplicatingStream(), which returns a new InputStream, which mirrors the original input stream until the controlling class sees the approapriate endElement() method). My concerns are: what if SAX is performing internal buffering of some sort? In this case, it would not be possible to know where exactly to start mirroring the InputStream. I noticed that DefaultHandler has a getLocator() method, but it only returns the position in terms of column and line numbers. Whether or not the above is the case, there is no documented guarantee that the internal buffering situation will be consistent in future releases, or other implementing classes. My other concern is that after startElement() is called, the InputStream will have already parsed the <element ...> tag, and so the InputStream needs to know how far back to start mirroring. As far as I can tell, the only way to do that would be to have another proxy class, which marks the InputStream, and thus the previous mark before startElement() was called should be just before the '<'. For the reasons described above, the proxy class cannot (I think?) be automatic, and would require me to manually use the same code for each method implemented by DefaultHandler (which isn't significant in terms of effort, but looks a bit messy :-) ). Please do tell me if I have the wrong end of the stick. I'm afraid that for the time being, I'm going to resort to serialising the XML elements. Should I implement a solution to this problem in the future, I'll send the code to the list. I think that this problem would do well to be documented on the FAQ, since I imagine that it is not uncommon. Cheers, Colin 2009/7/18 jdom <jdom@...>: > Colin. > > My instinct would be to investigate a different approach... > > Perhaps a mechanism similar to how ZipInputStreams work, where the > stream can be read as separate streams for each 'element'. > > Build a 'tee' or 'branched' custom InputStream between the main sax > parser, and the underlying 'infinite' stream. This intermediate stream > can be used to feed 'child' streams to the JDOM's sax parser, but use a > standard sax parser to terminate the child stream using a mechanism > similar to what you described below. > > This way you have just one 'infinite' stream, and you feed the contents > to one 'global' parser which implements 'break logic' on a seperate > version of the stream which feeds JDOM. When the end of the element is > encountered in the main stream it causes the JDOM stream to reach 'end > of file', and the JDOM side of things can then open a new 'child' stream > for the next 'document'. > > No (little) memory overhead. No need to buffer complete documents, etc. > > InputStreams are relatively simple to implement ;-) > > Rolf > > > > Colin Horne wrote: >> >> Hello, >> >> I have a long (infinite) XML stream, which I intend to parse with SAX. >> Each individual element in the stream is small, and should be parsed >> with JDOM: >> >> <stream> >> <element>...</element> >> <element>...</element> >> </stream> >> >> So each <element> (and their children) are parsed with JDOM, but the >> <stream> as a whole is parsed with SAX. It would be preferable if each >> <element> does not have to be serialized to an encoded string, and if >> elements are not processed twice (e.g., using SAX to echo the >> <element>'s XML to a stream, which is then read by JDOM). >> >> I've found several references to this problem from the past, but could >> not find a complete solution. >> >> My initial approach was to use the SAXHandler, like so: >> >> jdomHandler = new SAXHandler() { >> public void endElement(String uri, String localName, String >> qName) { >> if (qName.equals("an element which I want JDOM to parse")) >> { >> // change the SAX handler to myHandler >> } else { >> super.endElement(uri, localName, qName); >> } >> } >> }; >> >> >> myHandler = new DefaultHandler() { >> public void startElement(String uri, String localName, >> String qName, Attributes attributes) { >> if (qName.equals("an element which I want JDOM to parse")) >> { >> // change the SAX handler to jdomHandler >> } >> } >> }; >> >> >> (Ignoring for now that the endElement() method needs to keep track of >> its nesting level) >> >> However, trivially doing the above does not work, and fails after >> calling jdomHandler.getDocument(): >> >> Exception in thread "main" java.lang.IllegalStateException: Root element >> not set >> at org.jdom.Document.getRootElement(Document.java:218) >> >> >> I've looked at the initalization code that JDOM is normally doing in >> the SAXBuilder.build() method, and am reluctant to copy/modify the >> code, because I suspect it will break with future releases, and can't >> help but wonder if it would be over-complicating things. >> >> Is there a Right Way(TM) to do this? If so, I might also suggest that >> it's referenced from the FAQ. >> >> Many thanks, >> Colin >> _______________________________________________ >> To control your jdom-interest membership: >> http://www.jdom.org/mailman/options/jdom-interest/youraddr@... >> >> > > > _______________________________________________ To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr@... |
|
|
RE: Converting elements from a SAX stream to JDOM elements> Unfortunately, this didn't work. I've never implemented a > dynamic proxy in Java before, and it seems that they can only > implement interfaces, not classes. Since DefaultHandler is a > class, Java refused to typecast my proxy. DefaultHandler is a helper class that implements ContentHandler. Your proxy class should either *implement* the interface ContentHandler, or *extend* the class DefaultHandler. Regards, Michael Kay http://www.saxonica.com/ http://twitter.com/michaelhkay _______________________________________________ To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr@... |
|
|
Re: Converting elements from a SAX stream to JDOM elements--- On Sat, 7/18/09, Colin Horne <colin@...> wrote: > From: Colin Horne <colin@...> > Subject: [jdom-interest] Converting elements from a SAX stream to JDOM ... > > I have a long (infinite) XML stream, which I intend to parse with SAX. > Each individual element in the stream is small, and should > be parsed with JDOM: > > <stream> > <element>...</element> > <element>...</element> > </stream> > > So each <element> (and their children) are parsed with JDOM, but the > <stream> as a whole is parsed with SAX. It would be Do you absolutely have to use SAX for parsing here? This is a rather typical case where STAX API works well for parsing. Since you control traversal over input, it is easy to pass Stax XMLStreamReader for whoever builds JDOM trees, sub-tree by sub-tree. For what it's worth, I wrote such a build a while ago: http://docs.codehaus.org/display/WSTX/StaxMisc and submitted it for JDOM. Not sure if it's included anywhere (under contribs, maybe). Hope this helps, -+ Tatu +- ps. There are also plans to include this functionality in StaxMate [http://staxmate.codehaus.org] (issue [http://jira.codehaus.org/browse/STAXMATE-9])-- 2.0 has equivalent for DOM trees, but no JDOM support yet. _______________________________________________ To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr@... |
| Free embeddable forum powered by Nabble | Forum Help |