Improving performance of SAX parser configuration

View: New views
2 Messages — Rating Filter:   Alert me  

Improving performance of SAX parser configuration

by Bugzilla from lscotte@gmail.com :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Greetings jdom-interest,

I've run across an interesting performance issue in the way JDOM
handles Xerces parser configuration even when reuseParser is enabled
in SAXBuilder, and I wanted to run this by the list - not only for
validation, but hopefully something along the lines of this
improvement can get rolled in (yep, I know JDOM is in maintenance
mode).

For a bit of background, the particular case we have involves parsing
lots and lots of little XML document fragments via SAXBuilder.build()
- not terribly efficient, but for what we use JDOM for it's a
pre-existing condition that we're stuck with.

What I found is that more time was spent in configureParser() than in
actually parsing the XML. The reason for this is attempting to set
options on the parser which don't exist in Xerces - or at least the
version of it we are using. This results in
SaxNotRecognizedExceptions. Exceptions are expensive, plus Xerces does
ResourceBundle lookups each time. While we do set reuseParser, each
execution of build() still reconfigures the underlying parser.

I know that the contentHandler, and perhaps other options are not
reusable, and this doesn't change the semantics of that. Since the
underlying parser is unlikely to suddenly start supporting some option
it didn't used to, it's possible to remember whether or not the
underlying parser implementation was able to support a property, and
skip attempting to configuring it if not. I wired this as a specific
option only used with reuseParser to be safe, but it's possible this
could be done in a more generic manner that would benefit other
codepaths and usages as well (it would simply my patch somewhat, but I
wanted to be safe since there may be other consequences of this which
I've overlooked).

Again, I wouldn't expect this to help cases where larger XML is
handled less frequently, but for my case where it's hundreds of XML
fragments per transaction per second, this fix reduces the execution
time of SAXBuilder.build() by about 1/2.

I would love to hear any feedback as well as find out if anyone else
has the same sort of performance improvements I've seen with this
patch in cases where lots of small documents are parsed.

Thanks for your time,
-Scott

===CUT HERE===
diff --git a/src/java/org/jdom/input/SAXBuilder.java
b/src/java/org/jdom/input/SAXBuilder.java
index 09fbb00..1627345 100644
--- a/src/java/org/jdom/input/SAXBuilder.java
+++ b/src/java/org/jdom/input/SAXBuilder.java
@@ -134,6 +134,15 @@ public class SAXBuilder {
     /** User-specified properties to be set on the SAX parser */
     private HashMap properties = new HashMap(5);

+    /** Whether to use fast parser reconfiguration */
+    private boolean fastReconfigure = false;
+
+    /** Whether to try lexical reporting in fast parser reconfiguration */
+    private boolean tryLexicalReportingConfig = true;
+
+    /** Whether to to try entity expansion in fast parser reconfiguration */
+    private boolean tryEntityExpandConfig = true;
+
     /**
      * Whether parser reuse is allowed.
      * <p>Default: <code>true</code></p>
@@ -396,6 +405,25 @@ public class SAXBuilder {
     }

     /**
+     * Specifies whether this builder will do fast reconfiguration of the
+     * underlying SAX parser when reuseParser is true. This improves
+     * performance in cases where SAXBuilders are reused and lots of small
+     * documents are frequently parsed. This avoids attempting to set features
+     * on the SAX parser each time build() is called which result in
+     * SaxNotRecognizedExceptions. This should ONLY be set for builders where
+     * this specific case is an issue. The default value of this setting is
+     * <code>false</code> (no fast reconfiguration). If reuseParser is false,
+     * calling this has no effect.
+     *
+     * @param reuseParser Whether to reuse the SAX parser.
+     */
+    public void setFastReconfigure(boolean fastReconfigure) {
+        if (this.reuseParser) {
+            this.fastReconfigure = fastReconfigure;
+        }
+    }
+
+    /**
      * This sets a feature on the SAX parser. See the SAX documentation for
      * </p>
      * <p>
@@ -657,42 +685,76 @@ public class SAXBuilder {
              parser.setErrorHandler(new BuilderErrorHandler());
         }

-        // Setup lexical reporting.
-        boolean lexicalReporting = false;
-        try {
-            parser.setProperty("http://xml.org/sax/handlers/LexicalHandler",
-                               contentHandler);
-            lexicalReporting = true;
-        } catch (SAXNotSupportedException e) {
-            // No lexical reporting available
-        } catch (SAXNotRecognizedException e) {
-            // No lexical reporting available
-        }
+        /* If fastReconfigure is enabled and we failed in the previous attempt
+         * in configuring lexical reporting, then skip this step.
+         */
+        if (tryLexicalReportingConfig) {
+            boolean configured = true;

-        // Some parsers use alternate property for lexical handling (grr...)
-        if (!lexicalReporting) {
+            // Setup lexical reporting.
+            boolean lexicalReporting = false;
             try {
-                parser.setProperty(
-                    "http://xml.org/sax/properties/lexical-handler",
-                    contentHandler);
+
parser.setProperty("http://xml.org/sax/handlers/LexicalHandler",
+                                   contentHandler);
                 lexicalReporting = true;
             } catch (SAXNotSupportedException e) {
                 // No lexical reporting available
+                configured = false;
             } catch (SAXNotRecognizedException e) {
                 // No lexical reporting available
+                configured = false;
+            }
+
+            // Some parsers use alternate property for lexical
handling (grr...)
+            if (!lexicalReporting) {
+                try {
+                    parser.setProperty(
+                        "http://xml.org/sax/properties/lexical-handler",
+                        contentHandler);
+                    lexicalReporting = true;
+                } catch (SAXNotSupportedException e) {
+                    // No lexical reporting available
+                    configured = false;
+                } catch (SAXNotRecognizedException e) {
+                    // No lexical reporting available
+                    configured = false;
+                }
+            }
+
+            /* If unable to configure this property and fastReconfigure is
+             * enabled, then setup to avoid this code path entirely next time.
+             */
+            if (!configured && fastReconfigure) {
+                tryLexicalReportingConfig=false;
             }
         }

-        // Try setting the DeclHandler if entity expansion is off
-        if (!expand) {
-            try {
-                parser.setProperty(
-                    "http://xml.org/sax/properties/declaration-handler",
-                    contentHandler);
-            } catch (SAXNotSupportedException e) {
-                // No lexical reporting available
-            } catch (SAXNotRecognizedException e) {
-                // No lexical reporting available
+        /* If fastReconfigure is enabled and we failed in the previous attempt
+         * in configuring entity expansion, then skip this step.
+         */
+        if (tryEntityExpandConfig) {
+            boolean configured = true;
+
+            // Try setting the DeclHandler if entity expansion is off
+            if (!expand) {
+                try {
+                    parser.setProperty(
+                        "http://xml.org/sax/properties/declaration-handler",
+                        contentHandler);
+                } catch (SAXNotSupportedException e) {
+                    // No lexical reporting available
+                    configured = false;
+                } catch (SAXNotRecognizedException e) {
+                    // No lexical reporting available
+                    configured = false;
+                }
+            }
+
+            /* If unable to configure this property and fastReconfigure is
+             * enabled, then setup to avoid this code path entirely next time.
+             */
+            if (!configured && fastReconfigure) {
+                tryEntityExpandConfig=false;
             }
         }
     }
_______________________________________________
To control your jdom-interest membership:
http://www.jdom.org/mailman/options/jdom-interest/youraddr@...

Re: Improving performance of SAX parser configuration

by jhunter :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Scott,

Thanks for sending in what looks like a really good improvement!  I  
plan to add this to the codebase for the next release.  If anyone has  
issues, speak up now.

-jh-

On May 7, 2009, at 10:44 AM, Scott Emmons wrote:

> Greetings jdom-interest,
>
> I've run across an interesting performance issue in the way JDOM
> handles Xerces parser configuration even when reuseParser is enabled
> in SAXBuilder, and I wanted to run this by the list - not only for
> validation, but hopefully something along the lines of this
> improvement can get rolled in (yep, I know JDOM is in maintenance
> mode).
>
> For a bit of background, the particular case we have involves parsing
> lots and lots of little XML document fragments via SAXBuilder.build()
> - not terribly efficient, but for what we use JDOM for it's a
> pre-existing condition that we're stuck with.
>
> What I found is that more time was spent in configureParser() than in
> actually parsing the XML. The reason for this is attempting to set
> options on the parser which don't exist in Xerces - or at least the
> version of it we are using. This results in
> SaxNotRecognizedExceptions. Exceptions are expensive, plus Xerces does
> ResourceBundle lookups each time. While we do set reuseParser, each
> execution of build() still reconfigures the underlying parser.
>
> I know that the contentHandler, and perhaps other options are not
> reusable, and this doesn't change the semantics of that. Since the
> underlying parser is unlikely to suddenly start supporting some option
> it didn't used to, it's possible to remember whether or not the
> underlying parser implementation was able to support a property, and
> skip attempting to configuring it if not. I wired this as a specific
> option only used with reuseParser to be safe, but it's possible this
> could be done in a more generic manner that would benefit other
> codepaths and usages as well (it would simply my patch somewhat, but I
> wanted to be safe since there may be other consequences of this which
> I've overlooked).
>
> Again, I wouldn't expect this to help cases where larger XML is
> handled less frequently, but for my case where it's hundreds of XML
> fragments per transaction per second, this fix reduces the execution
> time of SAXBuilder.build() by about 1/2.
>
> I would love to hear any feedback as well as find out if anyone else
> has the same sort of performance improvements I've seen with this
> patch in cases where lots of small documents are parsed.
>
> Thanks for your time,
> -Scott
>
> ===CUT HERE===
> diff --git a/src/java/org/jdom/input/SAXBuilder.java
> b/src/java/org/jdom/input/SAXBuilder.java
> index 09fbb00..1627345 100644
> --- a/src/java/org/jdom/input/SAXBuilder.java
> +++ b/src/java/org/jdom/input/SAXBuilder.java
> @@ -134,6 +134,15 @@ public class SAXBuilder {
>     /** User-specified properties to be set on the SAX parser */
>     private HashMap properties = new HashMap(5);
>
> +    /** Whether to use fast parser reconfiguration */
> +    private boolean fastReconfigure = false;
> +
> +    /** Whether to try lexical reporting in fast parser  
> reconfiguration */
> +    private boolean tryLexicalReportingConfig = true;
> +
> +    /** Whether to to try entity expansion in fast parser  
> reconfiguration */
> +    private boolean tryEntityExpandConfig = true;
> +
>     /**
>      * Whether parser reuse is allowed.
>      * <p>Default: <code>true</code></p>
> @@ -396,6 +405,25 @@ public class SAXBuilder {
>     }
>
>     /**
> +     * Specifies whether this builder will do fast reconfiguration  
> of the
> +     * underlying SAX parser when reuseParser is true. This improves
> +     * performance in cases where SAXBuilders are reused and lots  
> of small
> +     * documents are frequently parsed. This avoids attempting to  
> set features
> +     * on the SAX parser each time build() is called which result in
> +     * SaxNotRecognizedExceptions. This should ONLY be set for  
> builders where
> +     * this specific case is an issue. The default value of this  
> setting is
> +     * <code>false</code> (no fast reconfiguration). If reuseParser  
> is false,
> +     * calling this has no effect.
> +     *
> +     * @param reuseParser Whether to reuse the SAX parser.
> +     */
> +    public void setFastReconfigure(boolean fastReconfigure) {
> +        if (this.reuseParser) {
> +            this.fastReconfigure = fastReconfigure;
> +        }
> +    }
> +
> +    /**
>      * This sets a feature on the SAX parser. See the SAX  
> documentation for
>      * </p>
>      * <p>
> @@ -657,42 +685,76 @@ public class SAXBuilder {
>              parser.setErrorHandler(new BuilderErrorHandler());
>         }
>
> -        // Setup lexical reporting.
> -        boolean lexicalReporting = false;
> -        try {
> -            parser.setProperty("http://xml.org/sax/handlers/LexicalHandler 
> ",
> -                               contentHandler);
> -            lexicalReporting = true;
> -        } catch (SAXNotSupportedException e) {
> -            // No lexical reporting available
> -        } catch (SAXNotRecognizedException e) {
> -            // No lexical reporting available
> -        }
> +        /* If fastReconfigure is enabled and we failed in the  
> previous attempt
> +         * in configuring lexical reporting, then skip this step.
> +         */
> +        if (tryLexicalReportingConfig) {
> +            boolean configured = true;
>
> -        // Some parsers use alternate property for lexical handling  
> (grr...)
> -        if (!lexicalReporting) {
> +            // Setup lexical reporting.
> +            boolean lexicalReporting = false;
>             try {
> -                parser.setProperty(
> -                    "http://xml.org/sax/properties/lexical-handler",
> -                    contentHandler);
> +
> parser.setProperty("http://xml.org/sax/handlers/LexicalHandler",
> +                                   contentHandler);
>                 lexicalReporting = true;
>             } catch (SAXNotSupportedException e) {
>                 // No lexical reporting available
> +                configured = false;
>             } catch (SAXNotRecognizedException e) {
>                 // No lexical reporting available
> +                configured = false;
> +            }
> +
> +            // Some parsers use alternate property for lexical
> handling (grr...)
> +            if (!lexicalReporting) {
> +                try {
> +                    parser.setProperty(
> +                        "http://xml.org/sax/properties/lexical-handler 
> ",
> +                        contentHandler);
> +                    lexicalReporting = true;
> +                } catch (SAXNotSupportedException e) {
> +                    // No lexical reporting available
> +                    configured = false;
> +                } catch (SAXNotRecognizedException e) {
> +                    // No lexical reporting available
> +                    configured = false;
> +                }
> +            }
> +
> +            /* If unable to configure this property and  
> fastReconfigure is
> +             * enabled, then setup to avoid this code path entirely  
> next time.
> +             */
> +            if (!configured && fastReconfigure) {
> +                tryLexicalReportingConfig=false;
>             }
>         }
>
> -        // Try setting the DeclHandler if entity expansion is off
> -        if (!expand) {
> -            try {
> -                parser.setProperty(
> -                    "http://xml.org/sax/properties/declaration-handler 
> ",
> -                    contentHandler);
> -            } catch (SAXNotSupportedException e) {
> -                // No lexical reporting available
> -            } catch (SAXNotRecognizedException e) {
> -                // No lexical reporting available
> +        /* If fastReconfigure is enabled and we failed in the  
> previous attempt
> +         * in configuring entity expansion, then skip this step.
> +         */
> +        if (tryEntityExpandConfig) {
> +            boolean configured = true;
> +
> +            // Try setting the DeclHandler if entity expansion is off
> +            if (!expand) {
> +                try {
> +                    parser.setProperty(
> +                        "http://xml.org/sax/properties/declaration-handler 
> ",
> +                        contentHandler);
> +                } catch (SAXNotSupportedException e) {
> +                    // No lexical reporting available
> +                    configured = false;
> +                } catch (SAXNotRecognizedException e) {
> +                    // No lexical reporting available
> +                    configured = false;
> +                }
> +            }
> +
> +            /* If unable to configure this property and  
> fastReconfigure is
> +             * enabled, then setup to avoid this code path entirely  
> next time.
> +             */
> +            if (!configured && fastReconfigure) {
> +                tryEntityExpandConfig=false;
>             }
>         }
>     }
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/ 
> youraddr@...

_______________________________________________
To control your jdom-interest membership:
http://www.jdom.org/mailman/options/jdom-interest/youraddr@...