|
View:
New views
2 Messages
—
Rating Filter:
Alert me
|
|
|
Improving performance of SAX parser configurationGreetings jdom-interest,
I've run across an interesting performance issue in the way JDOM handles Xerces parser configuration even when reuseParser is enabled in SAXBuilder, and I wanted to run this by the list - not only for validation, but hopefully something along the lines of this improvement can get rolled in (yep, I know JDOM is in maintenance mode). For a bit of background, the particular case we have involves parsing lots and lots of little XML document fragments via SAXBuilder.build() - not terribly efficient, but for what we use JDOM for it's a pre-existing condition that we're stuck with. What I found is that more time was spent in configureParser() than in actually parsing the XML. The reason for this is attempting to set options on the parser which don't exist in Xerces - or at least the version of it we are using. This results in SaxNotRecognizedExceptions. Exceptions are expensive, plus Xerces does ResourceBundle lookups each time. While we do set reuseParser, each execution of build() still reconfigures the underlying parser. I know that the contentHandler, and perhaps other options are not reusable, and this doesn't change the semantics of that. Since the underlying parser is unlikely to suddenly start supporting some option it didn't used to, it's possible to remember whether or not the underlying parser implementation was able to support a property, and skip attempting to configuring it if not. I wired this as a specific option only used with reuseParser to be safe, but it's possible this could be done in a more generic manner that would benefit other codepaths and usages as well (it would simply my patch somewhat, but I wanted to be safe since there may be other consequences of this which I've overlooked). Again, I wouldn't expect this to help cases where larger XML is handled less frequently, but for my case where it's hundreds of XML fragments per transaction per second, this fix reduces the execution time of SAXBuilder.build() by about 1/2. I would love to hear any feedback as well as find out if anyone else has the same sort of performance improvements I've seen with this patch in cases where lots of small documents are parsed. Thanks for your time, -Scott ===CUT HERE=== diff --git a/src/java/org/jdom/input/SAXBuilder.java b/src/java/org/jdom/input/SAXBuilder.java index 09fbb00..1627345 100644 --- a/src/java/org/jdom/input/SAXBuilder.java +++ b/src/java/org/jdom/input/SAXBuilder.java @@ -134,6 +134,15 @@ public class SAXBuilder { /** User-specified properties to be set on the SAX parser */ private HashMap properties = new HashMap(5); + /** Whether to use fast parser reconfiguration */ + private boolean fastReconfigure = false; + + /** Whether to try lexical reporting in fast parser reconfiguration */ + private boolean tryLexicalReportingConfig = true; + + /** Whether to to try entity expansion in fast parser reconfiguration */ + private boolean tryEntityExpandConfig = true; + /** * Whether parser reuse is allowed. * <p>Default: <code>true</code></p> @@ -396,6 +405,25 @@ public class SAXBuilder { } /** + * Specifies whether this builder will do fast reconfiguration of the + * underlying SAX parser when reuseParser is true. This improves + * performance in cases where SAXBuilders are reused and lots of small + * documents are frequently parsed. This avoids attempting to set features + * on the SAX parser each time build() is called which result in + * SaxNotRecognizedExceptions. This should ONLY be set for builders where + * this specific case is an issue. The default value of this setting is + * <code>false</code> (no fast reconfiguration). If reuseParser is false, + * calling this has no effect. + * + * @param reuseParser Whether to reuse the SAX parser. + */ + public void setFastReconfigure(boolean fastReconfigure) { + if (this.reuseParser) { + this.fastReconfigure = fastReconfigure; + } + } + + /** * This sets a feature on the SAX parser. See the SAX documentation for * </p> * <p> @@ -657,42 +685,76 @@ public class SAXBuilder { parser.setErrorHandler(new BuilderErrorHandler()); } - // Setup lexical reporting. - boolean lexicalReporting = false; - try { - parser.setProperty("http://xml.org/sax/handlers/LexicalHandler", - contentHandler); - lexicalReporting = true; - } catch (SAXNotSupportedException e) { - // No lexical reporting available - } catch (SAXNotRecognizedException e) { - // No lexical reporting available - } + /* If fastReconfigure is enabled and we failed in the previous attempt + * in configuring lexical reporting, then skip this step. + */ + if (tryLexicalReportingConfig) { + boolean configured = true; - // Some parsers use alternate property for lexical handling (grr...) - if (!lexicalReporting) { + // Setup lexical reporting. + boolean lexicalReporting = false; try { - parser.setProperty( - "http://xml.org/sax/properties/lexical-handler", - contentHandler); + parser.setProperty("http://xml.org/sax/handlers/LexicalHandler", + contentHandler); lexicalReporting = true; } catch (SAXNotSupportedException e) { // No lexical reporting available + configured = false; } catch (SAXNotRecognizedException e) { // No lexical reporting available + configured = false; + } + + // Some parsers use alternate property for lexical handling (grr...) + if (!lexicalReporting) { + try { + parser.setProperty( + "http://xml.org/sax/properties/lexical-handler", + contentHandler); + lexicalReporting = true; + } catch (SAXNotSupportedException e) { + // No lexical reporting available + configured = false; + } catch (SAXNotRecognizedException e) { + // No lexical reporting available + configured = false; + } + } + + /* If unable to configure this property and fastReconfigure is + * enabled, then setup to avoid this code path entirely next time. + */ + if (!configured && fastReconfigure) { + tryLexicalReportingConfig=false; } } - // Try setting the DeclHandler if entity expansion is off - if (!expand) { - try { - parser.setProperty( - "http://xml.org/sax/properties/declaration-handler", - contentHandler); - } catch (SAXNotSupportedException e) { - // No lexical reporting available - } catch (SAXNotRecognizedException e) { - // No lexical reporting available + /* If fastReconfigure is enabled and we failed in the previous attempt + * in configuring entity expansion, then skip this step. + */ + if (tryEntityExpandConfig) { + boolean configured = true; + + // Try setting the DeclHandler if entity expansion is off + if (!expand) { + try { + parser.setProperty( + "http://xml.org/sax/properties/declaration-handler", + contentHandler); + } catch (SAXNotSupportedException e) { + // No lexical reporting available + configured = false; + } catch (SAXNotRecognizedException e) { + // No lexical reporting available + configured = false; + } + } + + /* If unable to configure this property and fastReconfigure is + * enabled, then setup to avoid this code path entirely next time. + */ + if (!configured && fastReconfigure) { + tryEntityExpandConfig=false; } } } _______________________________________________ To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr@... |
|
|
Re: Improving performance of SAX parser configurationHi Scott,
Thanks for sending in what looks like a really good improvement! I plan to add this to the codebase for the next release. If anyone has issues, speak up now. -jh- On May 7, 2009, at 10:44 AM, Scott Emmons wrote: > Greetings jdom-interest, > > I've run across an interesting performance issue in the way JDOM > handles Xerces parser configuration even when reuseParser is enabled > in SAXBuilder, and I wanted to run this by the list - not only for > validation, but hopefully something along the lines of this > improvement can get rolled in (yep, I know JDOM is in maintenance > mode). > > For a bit of background, the particular case we have involves parsing > lots and lots of little XML document fragments via SAXBuilder.build() > - not terribly efficient, but for what we use JDOM for it's a > pre-existing condition that we're stuck with. > > What I found is that more time was spent in configureParser() than in > actually parsing the XML. The reason for this is attempting to set > options on the parser which don't exist in Xerces - or at least the > version of it we are using. This results in > SaxNotRecognizedExceptions. Exceptions are expensive, plus Xerces does > ResourceBundle lookups each time. While we do set reuseParser, each > execution of build() still reconfigures the underlying parser. > > I know that the contentHandler, and perhaps other options are not > reusable, and this doesn't change the semantics of that. Since the > underlying parser is unlikely to suddenly start supporting some option > it didn't used to, it's possible to remember whether or not the > underlying parser implementation was able to support a property, and > skip attempting to configuring it if not. I wired this as a specific > option only used with reuseParser to be safe, but it's possible this > could be done in a more generic manner that would benefit other > codepaths and usages as well (it would simply my patch somewhat, but I > wanted to be safe since there may be other consequences of this which > I've overlooked). > > Again, I wouldn't expect this to help cases where larger XML is > handled less frequently, but for my case where it's hundreds of XML > fragments per transaction per second, this fix reduces the execution > time of SAXBuilder.build() by about 1/2. > > I would love to hear any feedback as well as find out if anyone else > has the same sort of performance improvements I've seen with this > patch in cases where lots of small documents are parsed. > > Thanks for your time, > -Scott > > ===CUT HERE=== > diff --git a/src/java/org/jdom/input/SAXBuilder.java > b/src/java/org/jdom/input/SAXBuilder.java > index 09fbb00..1627345 100644 > --- a/src/java/org/jdom/input/SAXBuilder.java > +++ b/src/java/org/jdom/input/SAXBuilder.java > @@ -134,6 +134,15 @@ public class SAXBuilder { > /** User-specified properties to be set on the SAX parser */ > private HashMap properties = new HashMap(5); > > + /** Whether to use fast parser reconfiguration */ > + private boolean fastReconfigure = false; > + > + /** Whether to try lexical reporting in fast parser > reconfiguration */ > + private boolean tryLexicalReportingConfig = true; > + > + /** Whether to to try entity expansion in fast parser > reconfiguration */ > + private boolean tryEntityExpandConfig = true; > + > /** > * Whether parser reuse is allowed. > * <p>Default: <code>true</code></p> > @@ -396,6 +405,25 @@ public class SAXBuilder { > } > > /** > + * Specifies whether this builder will do fast reconfiguration > of the > + * underlying SAX parser when reuseParser is true. This improves > + * performance in cases where SAXBuilders are reused and lots > of small > + * documents are frequently parsed. This avoids attempting to > set features > + * on the SAX parser each time build() is called which result in > + * SaxNotRecognizedExceptions. This should ONLY be set for > builders where > + * this specific case is an issue. The default value of this > setting is > + * <code>false</code> (no fast reconfiguration). If reuseParser > is false, > + * calling this has no effect. > + * > + * @param reuseParser Whether to reuse the SAX parser. > + */ > + public void setFastReconfigure(boolean fastReconfigure) { > + if (this.reuseParser) { > + this.fastReconfigure = fastReconfigure; > + } > + } > + > + /** > * This sets a feature on the SAX parser. See the SAX > documentation for > * </p> > * <p> > @@ -657,42 +685,76 @@ public class SAXBuilder { > parser.setErrorHandler(new BuilderErrorHandler()); > } > > - // Setup lexical reporting. > - boolean lexicalReporting = false; > - try { > - parser.setProperty("http://xml.org/sax/handlers/LexicalHandler > ", > - contentHandler); > - lexicalReporting = true; > - } catch (SAXNotSupportedException e) { > - // No lexical reporting available > - } catch (SAXNotRecognizedException e) { > - // No lexical reporting available > - } > + /* If fastReconfigure is enabled and we failed in the > previous attempt > + * in configuring lexical reporting, then skip this step. > + */ > + if (tryLexicalReportingConfig) { > + boolean configured = true; > > - // Some parsers use alternate property for lexical handling > (grr...) > - if (!lexicalReporting) { > + // Setup lexical reporting. > + boolean lexicalReporting = false; > try { > - parser.setProperty( > - "http://xml.org/sax/properties/lexical-handler", > - contentHandler); > + > parser.setProperty("http://xml.org/sax/handlers/LexicalHandler", > + contentHandler); > lexicalReporting = true; > } catch (SAXNotSupportedException e) { > // No lexical reporting available > + configured = false; > } catch (SAXNotRecognizedException e) { > // No lexical reporting available > + configured = false; > + } > + > + // Some parsers use alternate property for lexical > handling (grr...) > + if (!lexicalReporting) { > + try { > + parser.setProperty( > + "http://xml.org/sax/properties/lexical-handler > ", > + contentHandler); > + lexicalReporting = true; > + } catch (SAXNotSupportedException e) { > + // No lexical reporting available > + configured = false; > + } catch (SAXNotRecognizedException e) { > + // No lexical reporting available > + configured = false; > + } > + } > + > + /* If unable to configure this property and > fastReconfigure is > + * enabled, then setup to avoid this code path entirely > next time. > + */ > + if (!configured && fastReconfigure) { > + tryLexicalReportingConfig=false; > } > } > > - // Try setting the DeclHandler if entity expansion is off > - if (!expand) { > - try { > - parser.setProperty( > - "http://xml.org/sax/properties/declaration-handler > ", > - contentHandler); > - } catch (SAXNotSupportedException e) { > - // No lexical reporting available > - } catch (SAXNotRecognizedException e) { > - // No lexical reporting available > + /* If fastReconfigure is enabled and we failed in the > previous attempt > + * in configuring entity expansion, then skip this step. > + */ > + if (tryEntityExpandConfig) { > + boolean configured = true; > + > + // Try setting the DeclHandler if entity expansion is off > + if (!expand) { > + try { > + parser.setProperty( > + "http://xml.org/sax/properties/declaration-handler > ", > + contentHandler); > + } catch (SAXNotSupportedException e) { > + // No lexical reporting available > + configured = false; > + } catch (SAXNotRecognizedException e) { > + // No lexical reporting available > + configured = false; > + } > + } > + > + /* If unable to configure this property and > fastReconfigure is > + * enabled, then setup to avoid this code path entirely > next time. > + */ > + if (!configured && fastReconfigure) { > + tryEntityExpandConfig=false; > } > } > } > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/ > youraddr@... _______________________________________________ To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr@... |
| Free embeddable forum powered by Nabble | Forum Help |