[jira] Created: (XERCESC-1452) DOMNamedNodeMapImpl::item() 10x preformance improvement

View: New views
10 Messages — Rating Filter:   Alert me  

[jira] Created: (XERCESC-1452) DOMNamedNodeMapImpl::item() 10x preformance improvement

by JIRA xerces-c-dev@xml.apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

DOMNamedNodeMapImpl::item() 10x preformance improvement
-------------------------------------------------------

         Key: XERCESC-1452
         URL: http://issues.apache.org/jira/browse/XERCESC-1452
     Project: Xerces-C++
        Type: Improvement
  Components: DOM  
    Versions: 2.6.0    
 Environment: All environments
    Reporter: Jeff Keasler


10 second bug fix -- change MAP_SIZE constant in DOMNamedNodeMapImpl.hpp from 193 to 17.

I use literally millions of DomNodes each having 2-10 attributes and DOMNamedNodeMapImpl::item() is horribly implemented.  It makes sense to fix the problem by changing the definition of MAP_SIZE to 17.  Even people with 50 attributes will get decent performance if you make this change, wheras the vast majority of people who only use 5-10 will see up to a 10x performance improvement.

Thank you.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@...
For additional commands, e-mail: c-dev-help@...


[jira] Commented: (XERCESC-1452) DOMNamedNodeMapImpl::item() 10x preformance improvement

by JIRA xerces-c-dev@xml.apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

    [ http://issues.apache.org/jira/browse/XERCESC-1452?page=comments#action_12314869 ]

Alberto Massari commented on XERCESC-1452:
------------------------------------------

Hi Jeff,
the DOMNamedNodeMapImpl class is not used to store attributes in an element (that's DOMAttrMapImpl); it is used to store the list of entities, notations and elements in a DTD. Can you double check why reducing the size of these 3 maps improves your performances?

Thanks,
Alberto

> DOMNamedNodeMapImpl::item() 10x preformance improvement
> -------------------------------------------------------
>
>          Key: XERCESC-1452
>          URL: http://issues.apache.org/jira/browse/XERCESC-1452
>      Project: Xerces-C++
>         Type: Improvement
>   Components: DOM
>     Versions: 2.6.0
>  Environment: All environments
>     Reporter: Jeff Keasler

>
> 10 second bug fix -- change MAP_SIZE constant in DOMNamedNodeMapImpl.hpp from 193 to 17.
> I use literally millions of DomNodes each having 2-10 attributes and DOMNamedNodeMapImpl::item() is horribly implemented.  It makes sense to fix the problem by changing the definition of MAP_SIZE to 17.  Even people with 50 attributes will get decent performance if you make this change, wheras the vast majority of people who only use 5-10 will see up to a 10x performance improvement.
> Thank you.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@...
For additional commands, e-mail: c-dev-help@...


Re: [jira] Commented: (XERCESC-1452) DOMNamedNodeMapImpl::item() 10x preformance improvement

by Axel Weiß :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Alberto Massari (JIRA) wrote:

>     [
> http://issues.apache.org/jira/browse/XERCESC-1452?page=comments#actio
>n_12314869 ]
>
> Alberto Massari commented on XERCESC-1452:
> ------------------------------------------
>
> Hi Jeff,
> the DOMNamedNodeMapImpl class is not used to store attributes in an
> element (that's DOMAttrMapImpl); it is used to store the list of
> entities, notations and elements in a DTD. Can you double check why
> reducing the size of these 3 maps improves your performances?

Hi Alberto,

querying all attributes of an element, is done by the loop:

DOMNamedNodeMap *map = node->getAttributes();
if (map){
        int i, size = map->Length();
        for (i=0; i<size; ++i){
                DOMNode *attr = map->item(i);
                // ...
        }
}

As I understand, the performance improvement here is made with respect
to the item(.) method (which is called size times and that's why it's
improvement is important), and not with respect to the internal
attribute handling of xerces.

Cheers,
                        Axel

--
Humboldt-Universität zu Berlin
Institut für Informatik
Signalverarbeitung und Mustererkennung
Dipl.-Inf. Axel Weiß
Rudower Chaussee 25
12489 Berlin-Adlershof
+49-30-2093-3050
** www.freesp.de **

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@...
For additional commands, e-mail: c-dev-help@...


Re: [jira] Commented: (XERCESC-1452) DOMNamedNodeMapImpl::item() 10x preformance improvement

by Alberto Massari :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

At 12.46 01/07/2005 +0200, Axel Weiß wrote:

>Alberto Massari (JIRA) wrote:
> >     [
> > http://issues.apache.org/jira/browse/XERCESC-1452?page=comments#actio
> >n_12314869 ]
> >
> > Alberto Massari commented on XERCESC-1452:
> > ------------------------------------------
> >
> > Hi Jeff,
> > the DOMNamedNodeMapImpl class is not used to store attributes in an
> > element (that's DOMAttrMapImpl); it is used to store the list of
> > entities, notations and elements in a DTD. Can you double check why
> > reducing the size of these 3 maps improves your performances?
>
>Hi Alberto,
>
>querying all attributes of an element, is done by the loop:
>
>DOMNamedNodeMap *map = node->getAttributes();
>if (map){
>         int i, size = map->Length();
>         for (i=0; i<size; ++i){
>                 DOMNode *attr = map->item(i);
>                 // ...
>         }
>}
>
>As I understand, the performance improvement here is made with respect
>to the item(.) method (which is called size times and that's why it's
>improvement is important), and not with respect to the internal
>attribute handling of xerces.

Hi Axel,

that code queries the attributes through an
interface (DOMNamedNodeMap), but it is actually
talking to an object of type DOMAttrMapImpl; the
fix he suggests is for the class
DOMNamedNodeMapImpl, but that will never store
attributes, only nodes stored in the DTD. The
only place where DOMNamedNodeMapImpl::item() is
invoked should be only when cloning the DTD node or if he does it in his code.

Alberto



---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@...
For additional commands, e-mail: c-dev-help@...


Parent Message unknown Re: [jira] Commented: (XERCESC-1452) DOMNamedNodeMapImpl::item() 10x preformance improvement

by Bugzilla from keasler@llnl.gov :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Alberto,

Thank you for getting back to me on this.

We write our DOM tree back out to disk after we parse it.

Below is the code we use to do that.  As you can see, we loop over the
attributes, but since there are 197 buckets to search through for each
attribute, it's taking alot of time.

Our actual *overall* time improvement is a factor of two, but
theoretically, a reduction from 197 to 17 could be up to a factor of ten.

Thanks,
-Jeff


/* XNode inherits from DomNode */

static void XmluWriteNode(FILE* outFile, int depth, XNode* node) {
    char* nodeName ;

    nodeName = node->GetNodeName();

    fprintf(outFile, "%s<%s", prefix[depth], nodeName);
    /// print attribute list, if any
    DOMNamedNodeMap* attribs = node->getAttributes() ;
    if (attribs != NULL) {
       int numAttribs = attribs->getLength() ;
       for (int i = 0; i < numAttribs; i++) {
          DOMNode* attr = attribs->item(i);
          char* attrName = XMLString::transcode(attr->getNodeName()) ;
          char* attrValue;
          node->GetAttributeValue(attrName, attrValue) ;
          fprintf(outFile, " %s=\"%s\"", attrName, attrValue) ;
          XMLString::release(&attrName) ;
          XMLString::release(&attrValue) ;
       }
    }
    if (node->hasChildNodes()) {
        /* recursive stuff */
    }
}




Alberto Massari (JIRA) wrote:

>     [ http://issues.apache.org/jira/browse/XERCESC-1452?page=comments#action_12314869 ]
>
> Alberto Massari commented on XERCESC-1452:
> ------------------------------------------
>
> Hi Jeff,
> the DOMNamedNodeMapImpl class is not used to store attributes in an element (that's DOMAttrMapImpl); it is used to store the list of entities, notations and elements in a DTD. Can you double check why reducing the size of these 3 maps improves your performances?
>
> Thanks,
> Alberto
>
>
>>DOMNamedNodeMapImpl::item() 10x preformance improvement
>>-------------------------------------------------------
>>
>>         Key: XERCESC-1452
>>         URL: http://issues.apache.org/jira/browse/XERCESC-1452
>>     Project: Xerces-C++
>>        Type: Improvement
>>  Components: DOM
>>    Versions: 2.6.0
>> Environment: All environments
>>    Reporter: Jeff Keasler
>
>
>>10 second bug fix -- change MAP_SIZE constant in DOMNamedNodeMapImpl.hpp from 193 to 17.
>>I use literally millions of DomNodes each having 2-10 attributes and DOMNamedNodeMapImpl::item() is horribly implemented.  It makes sense to fix the problem by changing the definition of MAP_SIZE to 17.  Even people with 50 attributes will get decent performance if you make this change, wheras the vast majority of people who only use 5-10 will see up to a 10x performance improvement.
>>Thank you.
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@...
For additional commands, e-mail: c-dev-help@...


Parent Message unknown Re: [jira] Commented: (XERCESC-1452) DOMNamedNodeMapImpl::item() 10x preformance improvement

by Bugzilla from keasler@llnl.gov :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Just so you know, I grabbed a random piece of code that was using that
attributes loop.  The code base I'm working on came from a team member
who just left, and I'm just trying to speed it up.

Thanks,
-Jeff


Alberto Massari (JIRA) wrote:

>     [ http://issues.apache.org/jira/browse/XERCESC-1452?page=comments#action_12314869 ]
>
> Alberto Massari commented on XERCESC-1452:
> ------------------------------------------
>
> Hi Jeff,
> the DOMNamedNodeMapImpl class is not used to store attributes in an element (that's DOMAttrMapImpl); it is used to store the list of entities, notations and elements in a DTD. Can you double check why reducing the size of these 3 maps improves your performances?
>
> Thanks,
> Alberto
>
>
>>DOMNamedNodeMapImpl::item() 10x preformance improvement
>>-------------------------------------------------------
>>
>>         Key: XERCESC-1452
>>         URL: http://issues.apache.org/jira/browse/XERCESC-1452
>>     Project: Xerces-C++
>>        Type: Improvement
>>  Components: DOM
>>    Versions: 2.6.0
>> Environment: All environments
>>    Reporter: Jeff Keasler
>
>
>>10 second bug fix -- change MAP_SIZE constant in DOMNamedNodeMapImpl.hpp from 193 to 17.
>>I use literally millions of DomNodes each having 2-10 attributes and DOMNamedNodeMapImpl::item() is horribly implemented.  It makes sense to fix the problem by changing the definition of MAP_SIZE to 17.  Even people with 50 attributes will get decent performance if you make this change, wheras the vast majority of people who only use 5-10 will see up to a 10x performance improvement.
>>Thank you.
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@...
For additional commands, e-mail: c-dev-help@...


Re: [jira] Commented: (XERCESC-1452) DOMNamedNodeMapImpl::item() 10x preformance improvement

by Axel Weiß :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jeff Keasler wrote:
> Just so you know, I grabbed a random piece of code that was using that
> attributes loop.  The code base I'm working on came from a team member
> who just left, and I'm just trying to speed it up.

Hi Jeff,

I'd propose to try DOMWriter::writeToString(.) for speed up.

Cheers,
                        Axel

--
Humboldt-Universität zu Berlin
Institut für Informatik
Signalverarbeitung und Mustererkennung
Dipl.-Inf. Axel Weiß
Rudower Chaussee 25
12489 Berlin-Adlershof
+49-30-2093-3050
** www.freesp.de **

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@...
For additional commands, e-mail: c-dev-help@...


Re: [jira] Commented: (XERCESC-1452) DOMNamedNodeMapImpl::item() 10x preformance improvement

by Alberto Massari :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Jeff,
I must repeat myself for the third time: the DOMNamedNodeMapImpl
class (that uses the map with 197 buckets) is used only in the DTD
node, not in the elements. Would you mind double checking your
implementation of the XNode class?

Alberto

At 09.47 01/07/2005 -0700, Jeff Keasler wrote:

>Alberto,
>
>Thank you for getting back to me on this.
>
>We write our DOM tree back out to disk after we parse it.
>
>Below is the code we use to do that.  As you can see, we loop over
>the attributes, but since there are 197 buckets to search through
>for each attribute, it's taking alot of time.
>
>Our actual *overall* time improvement is a factor of two, but
>theoretically, a reduction from 197 to 17 could be up to a factor of ten.
>
>Thanks,
>-Jeff
>
>
>/* XNode inherits from DomNode */
>
>static void XmluWriteNode(FILE* outFile, int depth, XNode* node) {
>    char* nodeName ;
>
>    nodeName = node->GetNodeName();
>
>    fprintf(outFile, "%s<%s", prefix[depth], nodeName);
>    /// print attribute list, if any
>    DOMNamedNodeMap* attribs = node->getAttributes() ;
>    if (attribs != NULL) {
>       int numAttribs = attribs->getLength() ;
>       for (int i = 0; i < numAttribs; i++) {
>          DOMNode* attr = attribs->item(i);
>          char* attrName = XMLString::transcode(attr->getNodeName()) ;
>          char* attrValue;
>          node->GetAttributeValue(attrName, attrValue) ;
>          fprintf(outFile, " %s=\"%s\"", attrName, attrValue) ;
>          XMLString::release(&attrName) ;
>          XMLString::release(&attrValue) ;
>       }
>    }
>    if (node->hasChildNodes()) {
>        /* recursive stuff */
>    }
>}
>
>
>
>
>Alberto Massari (JIRA) wrote:
>>     [
>> http://issues.apache.org/jira/browse/XERCESC-1452?page=comments#action_12314869 
>> ]
>>Alberto Massari commented on XERCESC-1452:
>>------------------------------------------
>>Hi Jeff,
>>the DOMNamedNodeMapImpl class is not used to store attributes in an
>>element (that's DOMAttrMapImpl); it is used to store the list of
>>entities, notations and elements in a DTD. Can you double check why
>>reducing the size of these 3 maps improves your performances?
>>Thanks,
>>Alberto
>>
>>>DOMNamedNodeMapImpl::item() 10x preformance improvement
>>>-------------------------------------------------------
>>>
>>>         Key: XERCESC-1452
>>>         URL: http://issues.apache.org/jira/browse/XERCESC-1452
>>>     Project: Xerces-C++
>>>        Type: Improvement
>>>  Components: DOM
>>>    Versions: 2.6.0
>>>Environment: All environments
>>>    Reporter: Jeff Keasler
>>
>>>10 second bug fix -- change MAP_SIZE constant in
>>>DOMNamedNodeMapImpl.hpp from 193 to 17.
>>>I use literally millions of DomNodes each having 2-10 attributes
>>>and DOMNamedNodeMapImpl::item() is horribly implemented.  It makes
>>>sense to fix the problem by changing the definition of MAP_SIZE to
>>>17.  Even people with 50 attributes will get decent performance if
>>>you make this change, wheras the vast majority of people who only
>>>use 5-10 will see up to a 10x performance improvement.
>>>Thank you.
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: c-dev-unsubscribe@...
>For additional commands, e-mail: c-dev-help@...



---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@...
For additional commands, e-mail: c-dev-help@...


[jira] Updated: (XERCESC-1452) DOMNamedNodeMapImpl::item() 10x preformance improvement

by JIRA xerces-c-dev@xml.apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/XERCESC-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Kolpackov updated XERCESC-1452:
-------------------------------------

    Affects Version/s:     (was: 2.6.0)
                       3.0.1
        Fix Version/s: 3.1.0
             Assignee: Boris Kolpackov

MAP_SIZE is the number of buckets in the hash table used to store things like entities, etc., as well as attributes. I also tend to think that 193 for attribute map is a bit too much since in most cases we don't have more than a few attributes. This implementation also doesn't not support rehashing so I wonder why don't we use one of the HashMap templates. I will check into this.

> DOMNamedNodeMapImpl::item() 10x preformance improvement
> -------------------------------------------------------
>
>                 Key: XERCESC-1452
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1452
>             Project: Xerces-C++
>          Issue Type: Improvement
>          Components: DOM
>    Affects Versions: 3.0.1
>         Environment: All environments
>            Reporter: Jeff Keasler
>            Assignee: Boris Kolpackov
>             Fix For: 3.1.0
>
>
> 10 second bug fix -- change MAP_SIZE constant in DOMNamedNodeMapImpl.hpp from 193 to 17.
> I use literally millions of DomNodes each having 2-10 attributes and DOMNamedNodeMapImpl::item() is horribly implemented.  It makes sense to fix the problem by changing the definition of MAP_SIZE to 17.  Even people with 50 attributes will get decent performance if you make this change, wheras the vast majority of people who only use 5-10 will see up to a 10x performance improvement.
> Thank you.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@...
For additional commands, e-mail: c-dev-help@...


[jira] Closed: (XERCESC-1452) DOMNamedNodeMapImpl::item() 10x preformance improvement

by JIRA xerces-c-dev@xml.apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/XERCESC-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Kolpackov closed XERCESC-1452.
------------------------------------

    Resolution: Won't Fix

Actually, Alberto is right. DOMNamedNodeMapImpl is only used to store entities, etc., in DOMDocumentTypeImpl. I tested and changing the MAP_SIZE value does not affect performance in any noticeable way.

> DOMNamedNodeMapImpl::item() 10x preformance improvement
> -------------------------------------------------------
>
>                 Key: XERCESC-1452
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1452
>             Project: Xerces-C++
>          Issue Type: Improvement
>          Components: DOM
>    Affects Versions: 3.0.1
>         Environment: All environments
>            Reporter: Jeff Keasler
>            Assignee: Boris Kolpackov
>             Fix For: 3.1.0
>
>
> 10 second bug fix -- change MAP_SIZE constant in DOMNamedNodeMapImpl.hpp from 193 to 17.
> I use literally millions of DomNodes each having 2-10 attributes and DOMNamedNodeMapImpl::item() is horribly implemented.  It makes sense to fix the problem by changing the definition of MAP_SIZE to 17.  Even people with 50 attributes will get decent performance if you make this change, wheras the vast majority of people who only use 5-10 will see up to a 10x performance improvement.
> Thank you.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@...
For additional commands, e-mail: c-dev-help@...