|
View:
New views
14 Messages
—
Rating Filter:
Alert me
|
|
|
best xml parser to useHi
I have a particular problem to solve: I have an xml batch file that contains individual xml invoices. I need to extract these xml invoices one at a time and place them on a message queue i.e. I just need to get all the data between the invoice start and end tags put it in a string and place it on a message queue (validation occurs on the invoice itself on the receiver side). What is likely to be my best approach, DOM (unlikely I guess), SAX, StAX or simply writing a java program using indexOf, in terms of performance ? TIA Peter |
|
|
RE: [xml-dev] best xml parser to use> > What is likely to be my best approach, DOM (unlikely I > guess), SAX, StAX or simply writing a java program using > indexOf, in terms of performance ? > Your performance, or the machine's performance? How big is the input file? Michael Kay http://www.saxonica.com/ |
|
|
Re: [xml-dev] best xml parser to useI don't know much about StAX, but if you'll be processing the data
linearly (i.e. with no need to rearrange it as part of your processing) SAX should be fine and quick. Bob On Wed, September 6, 2006 9:09 am, petera wrote: > > Hi > > I have a particular problem to solve: > > I have an xml batch file that contains individual xml invoices. I need to > extract these xml invoices one at a time and > place them on a message queue i.e. I just need to get all the data between > the invoice start and end tags put it > in a string and place it on a message queue (validation occurs on the > invoice itself on the receiver side). > > What is likely to be my best approach, DOM (unlikely I guess), SAX, StAX > or > simply writing a java program using indexOf, in terms of performance ? > > > TIA Peter > > -- > View this message in context: > http://www.nabble.com/best-xml-parser-to-use-tf2226882.html#a6171113 > Sent from the Xml.org Dev forum at Nabble.com. > > |
|
|
RE: [xml-dev] best xml parser to useMichael,
the machines performance. the input files could be quite large over 1mb and arriving from different sources. Peter
|
|
|
RE: [xml-dev] best xml parser to use>
> the machines performance. > > the input files could be quite large over 1mb and arriving > from different sources. > Well, only you know the performance requirements, but for a file as small as 1Mb many people would do the job in XSLT. It's not the fastest option, but my guess would be that it's probably capable of doing the job, and you'll be left with something that's easier to maintain. Michael Kay http://www.saxonica.com/ |
|
|
|
|
|
Re: [xml-dev] best xml parser to useIf you're coding in Java I'd suggest xmlbeans. I've found xmlbeans fast, easy, quick to employ; very handy in about half a dozen projects now.
You need to compile the schema which returns java code that will then allow you to directly reference any element. Then, simply reference the invoice structure's topmost element, and then do as you wish, either write the xml to the queue as simple text, or create a new xml document (just provides the xml header at the start of the file) and add only this copied element to it and write to the queue, or strip all or selected, etc..., etc..., and write to the queue and iterate to the next invoice or batch file. It could be 20 lines of code tops.
If you don't have a schema to feed into the schema compiler there are a couple of tools that you can build a schema and a couple that infer schema from sample xml.
KWL
On 9/6/06, petera <peter.anderson@...> wrote:
|
|
|
Re: [xml-dev] best xml parser to use1) parse document 2) loop through invoice child elements 3) serialize each child element to a String and post You could use xmlbeans or another data binding framework, but then you're just subsituting the generic DOM data model for a schema-specific data model. It doesn't sound like you care about the internal structure and content of an invoice, so generating the new Java classes required for data binding is unnecessary overhead. On 9/6/06, K. W. Landry <kwlandry@...> wrote:
|
|
|
Re: [xml-dev] best xml parser to useI'd agree with Justin on these points, there is excess that can be avoided. The unused classes generated by xmlbeans would simply sit idly by as the handful of classes that are necessary do all the heavy lifting.
The finer point I would add however, is that the generation of the code is a one time effort. However, that only holds as long as the schema, or structure, of the xml you're processing never changes.
One point of discussion I'll add is that I believe the coding necessary to accomplish the task described would be most efficient in that the xmlbeans implementation will do alot of the heavy lifting for what you need to do, without you having to code many details explicitly as I believe you would need to do with the strictly DOM focused approach.
A big Caveat however, is that I haven't used XOM, and JDOM and dom4j only minimally compared to xmlbeans, I'm sure others out there have more experience and can weigh in on that thought.
Overall, however, I believe the leanest and meanest approach for this in terms of performance and resource consumption is a stax implementation.
KWL
On 9/6/06, Justin Edelson <justinedelson@...> wrote:
|
|
|
Re: best xml parser to useThanks to all who replied to my e-mail. The approach I have decided upon is to try two implementations: 1. XSLT 2. StAX XSLT is really the simplest but my bosses might not like the memory requirements so StAX would a good alternative The URL provided by Brennan is excellent on StAX: http://java.sun.com/webservices/docs/1.6/tutorial/doc/SJSXP2.html Peter
|
|
|
Re: [xml-dev] best xml parser to useOn Sep 6, 2006, at 10:08 PM, K. W. Landry wrote:
This suggestions seems to be extreme overkill as you're not even interested in the contents of the invoices. In terms of performance, I would expect StAX and SAX to be roughly equal. Something hardcoded that is not XML-aware will be a lot faster, but much more error-prone. Stefan -- Stefan Tilkov, http://www.innoq.com/blog/st/
|
|
|
Re: best xml parser to usesorry I meant thread !!
|
|
|
Re: [xml-dev] Re: best xml parser to use--- petera <peter.anderson@...> wrote:
> > > Thanks to all who replied to my e-mail. > > The approach I have decided upon is to try two > implementations: > > 1. XSLT > > 2. StAX I would agree with these main choices. While you could use data binding (xmlbeans, jaxb2), that seems bit heavy-weight route since you don't care about type mappings etc, so most of the work would be overhead. SAX could of course be used, but I don't know of many benefits over Stax for this use case. > > XSLT is really the simplest but my bosses might not > like the memory > requirements so StAX would a good alternative > > The URL provided by Brennan is excellent on StAX: > http://java.sun.com/webservices/docs/1.6/tutorial/doc/SJSXP2.html In addition, if you decide to go Stax route (which may make sense if you will eventually get bigger files -- 1 M is still with about any solution, unless there's lots of concurrent processing), you may want to check out stax-utils project https://stax-utils.dev.java.net/ since 'raw' Stax API is bit of PITA to use for many tasks. For copying xml, using Event API it is quite straight-forward. You could also try out StaxMate that I wrote (http://woodstox.codehaus.org/StaxMate) which has support for accessing xml content in streaming way, but still allowing hierarchic traversal (in forward direction). Documentation is bit sparse, best way may be to read entries at (http://www.cowtowncoder.com/blog/blog.html). I should write sample code for this particular use case though, since it seems to be kind of recurring question (usually on stax_builders list though), and it should also be easy to add sub-tree pass-through copy operation, so that this particular task would be just couple of lines total. -+ Tatu +- __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
|
Re: [xml-dev] best xml parser to use--- Stefan Tilkov <info@...> wrote:
> On Sep 6, 2006, at 10:08 PM, K. W. Landry wrote: > ... > In terms of performance, I would expect StAX and SAX > to be roughly > equal. Something hardcoded that is not XML-aware > will be a lot > faster, but much more error-prone. Latter yes, former not necessarily. What I found out was that reading a stream using JDK BufferedReader(), reading line by line, was slightly slower than parsing content as XML. Your mileage may vary, but regular stream parsing is surprisingly fast nowadays (30 - 40 MBps on typical desktop machines), -+ Tatu +- __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
| Free embeddable forum powered by Nabble | Forum Help |