Performance of Woodstox StAX vs Java SAX

View: New views
4 Messages — Rating Filter:   Alert me  

Performance of Woodstox StAX vs Java SAX

by harbhanu :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.

 

Hi,

I am a newbie to this group…. Before I start actively using this parser, I have a query regarding its performance..

 

As per the parsing benchmarks reported at ….

http://www.xml.com/pub/a/2007/05/09/xml-parser-benchmarks-part-1.html?page=1

The performance of Woodstox StAX is always better than even the Java SAX Implementation.

Any particular reason for the results to be like this?

 

Did anyone of you do similar kind of performance benchmarking of C/Java based parser w.r.t to Woodstox.

Thanks!!

 

Regards,

Harbhanu


RE: Performance of Woodstox StAX vs Java SAX

by Michael Kay :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
Two reasons. Firstly, Woodstox is fast, no question about it. Secondly, I think the API is slightly lower-level. In my experience a pull parser is doing a little bit less work and unless you are very careful, the application has to do a little more. It's very easy to lose all the performance gains by the time you've written your application code.
 
Michael Kay
http://www.saxonica.com/


From: harbhanu [mailto:harbhanu@...]
Sent: 16 August 2007 09:50
To: user@...
Subject: [woodstox-user] Performance of Woodstox StAX vs Java SAX

 

Hi,

I am a newbie to this group…. Before I start actively using this parser, I have a query regarding its performance..

 

As per the parsing benchmarks reported at ….

http://www.xml.com/pub/a/2007/05/09/xml-parser-benchmarks-part-1.html?page=1

The performance of Woodstox StAX is always better than even the Java SAX Implementation.

Any particular reason for the results to be like this?

 

Did anyone of you do similar kind of performance benchmarking of C/Java based parser w.r.t to Woodstox.

Thanks!!

 

Regards,

Harbhanu


Parent Message unknown Re: Performance of Woodstox StAX vs Java SAX

by Tatu Saloranta :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
For what it's worth, now that Woodstox also implements SAX interface (as of 3.2.0), it is possible to see what effects API might have on implementations. With Woodstox, access via SAX API is slightly slower (for trivial browse-through case), but difference is minor (5-10%). Conceptually SAX and StAX interfaces are both low-level, and although to implement StAX one has to retain more state (and therefore gets more complicated), there are also couple of more optimizations one can do, as more processing can be deferred on as-needed basis than with SAX. But these do not have major effect on most usage.

Personally I would like to think that the speed is because Woodstox was designed from beginning with specific API and functionality in mind (as opposed to having to add new things like namespaces after the fact), as well as due to performance having been one of the top priorities throughout development.

As to Java-vs-C, interestingly there is one fairly recent comparison:

http://www.xml.com/pub/a/2007/05/09/xml-parser-benchmarks-part-1.html

results of which seem in line with what I would expect. Fastest (?) C implementation is bit faster than Woodstox, roughly with same ratio as Woodstox is faster than Xerces/SAX. Woodstox was the fastest of Java alternatives.

I hope this helps,

-+ Tatu +-

ps. Just in case it's not obvious, ObDisclaimer is that as the author of Woodstox my opinions may be slightly biased. ;-)

----- Original Message ----
From: Michael Kay <mike@...>
To: user@...
Sent: Thursday, August 16, 2007 2:15:58 AM
Subject: RE: [woodstox-user] Performance of Woodstox StAX vs Java SAX

Two reasons. Firstly, Woodstox is fast, no question about it. Secondly, I think the API is slightly lower-level. In my experience a pull parser is doing a little bit less work and unless you are very careful, the application has to do a little more. It's very easy to lose all the performance gains by the time you've written your application code.
 
Michael Kay


From: harbhanu [mailto:harbhanu@...]
Sent: 16 August 2007 09:50
To: user@...
Subject: [woodstox-user] Performance of Woodstox StAX vs Java SAX

 

Hi,

I am a newbie to this group…. Before I start actively using this parser, I have a query regarding its performance..

 

As per the parsing benchmarks reported at ….

http://www.xml.com/pub/a/2007/05/09/xml-parser-benchmarks-part-1.html?page=1

The performance of Woodstox StAX is always better than even the Java SAX Implementation.

Any particular reason for the results to be like this?

 

Did anyone of you do similar kind of performance benchmarking of C/Java based parser w.r.t to Woodstox.

Thanks!!

 

Regards,

Harbhanu




Choose the right car based on your needs. Check out Yahoo! Autos new Car Finder tool.

Parent Message unknown Re: Performance of Woodstox StAX vs Java SAX

by Tatu Saloranta :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
Ugh. Should have read the original question completely before answering. Didn't realize the original question was wrt the article I referred to. :-)

I have done some benchmarking both using my own StaxPerf code base (available under StaxPerf module within woodstox repository), and using performance test suite implemented by Nux project. The results I have seen are in line with reported results of the article. I also recall Michael doing some benchmarking, mentioned in his blog. Based on these, my usual approximation is that Woodstox tends to be 20-40% faster for parsing than fastest Java alternatives. Difference is usually bigger for smaller input (i.e. fixed per-document overhead is smaller), and less for huge files where things like i/o dominate.
This assuming that alternatives are used in somewhat optimal way, i.e. reusing SAX reader instances and Stax factories. Not doing so gives inflated overheads.

One often-neglected question is the output (xml writing) performance. Although I haven't seen quite as much testing with writers, Woodstox' performance seems good on that too. Some packages have surprisingly slow output rates (not Xerces, but some other packages, like alternate Stax implementations).

I haven't had a chance to test out native (C) implementations, and it is in general more challenging to compare these to Java, much harder to do fair comparisons.
That is why I was happy to see the comparison from the article; even though article itself was little bit short on details of the actual test environment and setup.

-+ Tatu +-

...
From: harbhanu [mailto:harbhanu@...]
Sent: 16 August 2007 09:50
To: user@...
Subject: [woodstox-user] Performance of Woodstox StAX vs Java SAX

 

Hi,

I am a newbie to this group…. Before I start actively using this parser, I have a query regarding its performance..

 

As per the parsing benchmarks reported at ….

http://www.xml.com/pub/a/2007/05/09/xml-parser-benchmarks-part-1.html?page=1

The performance of Woodstox StAX is always better than even the Java SAX Implementation.

Any particular reason for the results to be like this?

 

Did anyone of you do similar kind of performance benchmarking of C/Java based parser w.r.t to Woodstox.

Thanks!!

 

Regards,

Harbhanu




Choose the right car based on your needs. Check out Yahoo! Autos new Car Finder tool.



Pinpoint customers who are looking for what you sell.