Getting optimal performance for CachedXpathApi searches and DOM parsing

View: New views
3 Messages — Rating Filter:   Alert me  

Getting optimal performance for CachedXpathApi searches and DOM parsing

by Sandeep Takhar :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi.

I am new to the list and sorry if I get something wrong.  I searched the existing archives, but could not find the answer I am looking for.  Thanks for any help you have.  

We are seeing some small performance issues in our production environment.  I am busy seeing if I can reproduce them.  My questions are regarding these issues and possibly working around them.

1. CachedXpathApi is working well for us. I do see that the constructor is expensive.  I read on some IBM documentation that if you don't reuse parsers..the constructor is expensive because it tries to find a factory via jar file location (looping through the classpath).  In fact I see this happening when I look at the thread dump.  I see all the documentation and comments that indicate if you change a source document, then you need a new CachedXpathApi.  I see two things that I am not sure may work.

a) cachedXpathApi.getXpathContext().reset().  Can I call this instead of creating a new CachedXpathApi and then using the cachedXpathApi object to search new document?  I will still make the cachedXpathApi only execute in a single thread, but I don't want to call the constructor.
b)There is also a constructor which takes another CachedXpathApi(CachedXPathApi cachedXpathApi).  Is this of some use?
c) Can I use compiled xpath expressions and achieve a similar effect as the CachedXpathApi somehow...where I don't have to construct the CachedXpathApi()?  Does someone have a quick sample?

2. We are using the apache DOM parser.  Basically the DOMParser is the default apache one (version 2.7.0).  Currently we create the Builder from the factory and then call parse.  DocumentBuilderFactory.newDocumentBuilder()....I cannot remember exactly the syntax.  We can certainly not have to create the DocumentBuilder each time and I am suggesting this as a fix.  What I see happen is that there is some minor performance issues that happen in production in the parse method.  I don't have the execution thread handy, but it is always spending time on DocumentScanner$DTDDispatcher.dispatch method (may not be exact syntax)  Looks like all methods for the declaration handler.  I have to double check that, but the methods are named like the sax declaration handler...but we are not using SAX.  We have an entityId as the second line in the xml file that points to a DTD that only has a single line in the DTD file.  I've tried to understand the code, but haven't
 been able to figure it out

a) Will I stop seeing DTDDispatcher time taken in the threads if I remove the entity line in the source xml (no reference to a DTD)?
b) Is what I'm seeing completely normal?
c) can I set a property that will turn off the DTDDispatcher.dispatch method from being called?



Here are the methods I see that are spending time and things like string.intern().  Maybe it is completely normal, but it doesn't explain why it is slower in production...except that production load may be causing it...and that would be fine.

public void elementDecl(String name, String model)
   throws SAXException;
  public void attributeDecl(String elementName,
   String attributeName, String type, String mode,
   String defaultValue) throws SAXException;
  public void internalEntityDecl(String name, String value)
   throws SAXException;
  public void externalEntityDecl(String name, String publicID,
   String systemID) throws SAXException;





      __________________________________________________________________
Yahoo! Canada Toolbar: Search from anywhere on the web, and bookmark your favourite sites. Download it now
http://ca.toolbar.yahoo.com.

Re: Getting optimal performance for CachedXpathApi searches and DOM parsing

by Sandeep Takhar :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

For point #1 below, I have dug around a bit more and the code is relatively easy to understand (CachedXpathApi).  I can basically mimic the behaviour of the CachedXpathApi myself and this allows me to precompile my Xpath and also I can just create XpathContext(false) and call reset() on it, just like the code does.  This will work around my problem for #1.

CachedXpathApi()
{
  xpathSupport = new CachedXpathApi(false);
}

public XObject eval(Node contextNode, String str, Node namespaceNode)
        throws TransformerException
    {
        PrefixResolverDefault prefixResolver = new PrefixResolverDefault(((Node) (namespaceNode.getNodeType() != 9 ? namespaceNode : ((Node) (((Document)namespaceNode).getDocumentElement())))));
        XPath xpath = new XPath(str, null, prefixResolver, 0, null);
        int ctxtNode = xpathSupport.getDTMHandleFromNode(contextNode);
        return xpath.execute(xpathSupport, ctxtNode, prefixResolver);
    }


--- On Wed, 9/30/09, Sandeep Takhar <sandeep_takhar@...> wrote:

> From: Sandeep Takhar <sandeep_takhar@...>
> Subject: Getting optimal performance for CachedXpathApi searches and DOM parsing
> To: xalan-j-users@...
> Received: Wednesday, September 30, 2009, 9:24 PM
> Hi.
>
> I am new to the list and sorry if I get something
> wrong.  I searched the existing archives, but could not
> find the answer I am looking for.  Thanks for any help
> you have. 
>
> We are seeing some small performance issues in our
> production environment.  I am busy seeing if I can
> reproduce them.  My questions are regarding these
> issues and possibly working around them.
>
> 1. CachedXpathApi is working well for us. I do see that the
> constructor is expensive.  I read on some IBM
> documentation that if you don't reuse parsers..the
> constructor is expensive because it tries to find a factory
> via jar file location (looping through the classpath). 
> In fact I see this happening when I look at the thread
> dump.  I see all the documentation and comments that
> indicate if you change a source document, then you need a
> new CachedXpathApi.  I see two things that I am not
> sure may work.
>
> a) cachedXpathApi.getXpathContext().reset().  Can I
> call this instead of creating a new CachedXpathApi and then
> using the cachedXpathApi object to search new
> document?  I will still make the cachedXpathApi only
> execute in a single thread, but I don't want to call the
> constructor.
> b)There is also a constructor which takes another
> CachedXpathApi(CachedXPathApi cachedXpathApi).  Is this
> of some use?
> c) Can I use compiled xpath expressions and achieve a
> similar effect as the CachedXpathApi somehow...where I don't
> have to construct the CachedXpathApi()?  Does someone
> have a quick sample?
>
> 2. We are using the apache DOM parser.  Basically the
> DOMParser is the default apache one (version 2.7.0). 
> Currently we create the Builder from the factory and then
> call parse. 
> DocumentBuilderFactory.newDocumentBuilder()....I cannot
> remember exactly the syntax.  We can certainly not have
> to create the DocumentBuilder each time and I am suggesting
> this as a fix.  What I see happen is that there is some
> minor performance issues that happen in production in the
> parse method.  I don't have the execution thread handy,
> but it is always spending time on
> DocumentScanner$DTDDispatcher.dispatch method (may not be
> exact syntax)  Looks like all methods for the
> declaration handler.  I have to double check that, but
> the methods are named like the sax declaration handler...but
> we are not using SAX.  We have an entityId as the
> second line in the xml file that points to a DTD that only
> has a single line in the DTD file.  I've tried to
> understand the code, but haven't
>  been able to figure it out
>
> a) Will I stop seeing DTDDispatcher time taken in the
> threads if I remove the entity line in the source xml (no
> reference to a DTD)?
> b) Is what I'm seeing completely normal?
> c) can I set a property that will turn off the
> DTDDispatcher.dispatch method from being called?
>
>
>
> Here are the methods I see that are spending time and
> things like string.intern().  Maybe it is completely
> normal, but it doesn't explain why it is slower in
> production...except that production load may be causing
> it...and that would be fine.
>
> public void elementDecl(String name, String model)
>    throws SAXException;
>   public void attributeDecl(String elementName,
>    String attributeName, String type, String
> mode,
>    String defaultValue) throws
> SAXException;
>   public void internalEntityDecl(String name, String
> value)
>    throws SAXException;
>   public void externalEntityDecl(String name, String
> publicID,
>    String systemID) throws SAXException;
>
>
>
>
>
>      
> __________________________________________________________________
> Yahoo! Canada Toolbar: Search from anywhere on the web, and
> bookmark your favourite sites. Download it now
> http://ca.toolbar.yahoo.com.
>


      __________________________________________________________________
Yahoo! Canada Toolbar: Search from anywhere on the web, and bookmark your favourite sites. Download it now
http://ca.toolbar.yahoo.com.

Re: Getting optimal performance for CachedXpathApi searches and DOM parsing

by Sandeep Takhar :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Where I mentioned this before:

> CachedXpathApi()
> {
>   xpathSupport = new CachedXpathApi(false);
> }

I actually meant:

xpathSupport = new XpathContext(false);

Also, for #2 below I was reading this:

http://book.javanb.com/xml-and-java-developing-web-applications-2nd/0201770040_ch06lev1sec4.html

I think I can configure a minimal pipeline and try it out?  DTDScanner is a required element of the pipeline?  The DTD is empty so it is not really used, so I'm thinking that just the DocScanner is required and nothing else needs to be in the pipeline?  I am thinking that I can create my own configuration BasicParserConfiguration extending and then use this system property: org.apache.xerces.xni.parser.XMLParserConfiguration






--- On Wed, 9/30/09, Sandeep Takhar <sandeep_takhar@...> wrote:

> From: Sandeep Takhar <sandeep_takhar@...>
> Subject: Re: Getting optimal performance for CachedXpathApi searches and DOM parsing
> To: xalan-j-users@...
> Received: Wednesday, September 30, 2009, 10:56 PM
> For point #1 below, I have dug around
> a bit more and the code is relatively easy to understand
> (CachedXpathApi).  I can basically mimic the behaviour
> of the CachedXpathApi myself and this allows me to
> precompile my Xpath and also I can just create
> XpathContext(false) and call reset() on it, just like the
> code does.  This will work around my problem for #1.
>
> CachedXpathApi()
> {
>   xpathSupport = new CachedXpathApi(false);
> }
>
> public XObject eval(Node contextNode, String str, Node
> namespaceNode)
>         throws TransformerException
>     {
>         PrefixResolverDefault
> prefixResolver = new PrefixResolverDefault(((Node)
> (namespaceNode.getNodeType() != 9 ? namespaceNode : ((Node)
> (((Document)namespaceNode).getDocumentElement())))));
>         XPath xpath = new XPath(str,
> null, prefixResolver, 0, null);
>         int ctxtNode =
> xpathSupport.getDTMHandleFromNode(contextNode);
>         return
> xpath.execute(xpathSupport, ctxtNode, prefixResolver);
>     }
>
>
> --- On Wed, 9/30/09, Sandeep Takhar <sandeep_takhar@...>
> wrote:
>
> > From: Sandeep Takhar <sandeep_takhar@...>
> > Subject: Getting optimal performance for
> CachedXpathApi searches and DOM parsing
> > To: xalan-j-users@...
> > Received: Wednesday, September 30, 2009, 9:24 PM
> > Hi.
> >
> > I am new to the list and sorry if I get something
> > wrong.  I searched the existing archives, but could
> not
> > find the answer I am looking for.  Thanks for any
> help
> > you have. 
> >
> > We are seeing some small performance issues in our
> > production environment.  I am busy seeing if I can
> > reproduce them.  My questions are regarding these
> > issues and possibly working around them.
> >
> > 1. CachedXpathApi is working well for us. I do see
> that the
> > constructor is expensive.  I read on some IBM
> > documentation that if you don't reuse parsers..the
> > constructor is expensive because it tries to find a
> factory
> > via jar file location (looping through the
> classpath). 
> > In fact I see this happening when I look at the
> thread
> > dump.  I see all the documentation and comments that
> > indicate if you change a source document, then you
> need a
> > new CachedXpathApi.  I see two things that I am not
> > sure may work.
> >
> > a) cachedXpathApi.getXpathContext().reset().  Can I
> > call this instead of creating a new CachedXpathApi and
> then
> > using the cachedXpathApi object to search new
> > document?  I will still make the cachedXpathApi only
> > execute in a single thread, but I don't want to call
> the
> > constructor.
> > b)There is also a constructor which takes another
> > CachedXpathApi(CachedXPathApi cachedXpathApi).  Is
> this
> > of some use?
> > c) Can I use compiled xpath expressions and achieve a
> > similar effect as the CachedXpathApi somehow...where I
> don't
> > have to construct the CachedXpathApi()?  Does
> someone
> > have a quick sample?
> >
> > 2. We are using the apache DOM parser.  Basically
> the
> > DOMParser is the default apache one (version
> 2.7.0). 
> > Currently we create the Builder from the factory and
> then
> > call parse. 
> > DocumentBuilderFactory.newDocumentBuilder()....I
> cannot
> > remember exactly the syntax.  We can certainly not
> have
> > to create the DocumentBuilder each time and I am
> suggesting
> > this as a fix.  What I see happen is that there is
> some
> > minor performance issues that happen in production in
> the
> > parse method.  I don't have the execution thread
> handy,
> > but it is always spending time on
> > DocumentScanner$DTDDispatcher.dispatch method (may not
> be
> > exact syntax)  Looks like all methods for the
> > declaration handler.  I have to double check that,
> but
> > the methods are named like the sax declaration
> handler...but
> > we are not using SAX.  We have an entityId as the
> > second line in the xml file that points to a DTD that
> only
> > has a single line in the DTD file.  I've tried to
> > understand the code, but haven't
> >  been able to figure it out
> >
> > a) Will I stop seeing DTDDispatcher time taken in the
> > threads if I remove the entity line in the source xml
> (no
> > reference to a DTD)?
> > b) Is what I'm seeing completely normal?
> > c) can I set a property that will turn off the
> > DTDDispatcher.dispatch method from being called?
> >
> >
> >
> > Here are the methods I see that are spending time and
> > things like string.intern().  Maybe it is completely
> > normal, but it doesn't explain why it is slower in
> > production...except that production load may be
> causing
> > it...and that would be fine.
> >
> > public void elementDecl(String name, String model)
> >    throws SAXException;
> >   public void attributeDecl(String elementName,
> >    String attributeName, String type, String
> > mode,
> >    String defaultValue) throws
> > SAXException;
> >   public void internalEntityDecl(String name, String
> > value)
> >    throws SAXException;
> >   public void externalEntityDecl(String name, String
> > publicID,
> >    String systemID) throws SAXException;
> >
> >
> >
> >
> >
> >      
> >
> __________________________________________________________________
> > Yahoo! Canada Toolbar: Search from anywhere on the
> web, and
> > bookmark your favourite sites. Download it now
> > http://ca.toolbar.yahoo.com.
> >
>
>
>      
> __________________________________________________________________
> Yahoo! Canada Toolbar: Search from anywhere on the web, and
> bookmark your favourite sites. Download it now
> http://ca.toolbar.yahoo.com.
>


      __________________________________________________________________
Yahoo! Canada Toolbar: Search from anywhere on the web, and bookmark your favourite sites. Download it now
http://ca.toolbar.yahoo.com.