method startElement() from class DOMLSParserFilter

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 | Next >

method startElement() from class DOMLSParserFilter

by Mirko Braun-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello everybody,

i would like to parse a quite large XML file (about 180 MB).
I used the DOM interface because i need the tree for further
processing of the data the xml file contains. Of course there
is a lot of memory used during parsing the file and i got an
"Out of memory" exception.

I noticed that a class DOMLSParserFilter comes along wiht Xercesc C++ 3.0.1 (Win32), which makes it possible to filter the Nodes during parsing.
That is perfect for me because one XML-Element in my large file
contains most of the data. This XML-Element is called DATA and
appears serveral time in my XML file.
So i had the idea to reject this XML-Element from the DOM tree
during parsing to reduce the used memory by using the method
startElement() of the DOMLSParserFilter class. After that i would
use a SAX parser and just get all XML-Elements DATA with their values.
But it does not work.
I integregated my code into the DOMPrint example which comes along
with Xercesc C++ 3.0.1. The following error message occurred:

DOM Error during parsing: 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
DOMException code is:  3
Message is: attempt is made to insert a node where it is not permitted


Did i misunderstand the functionality of the DOMLSParserFilter class
and its method startElement?
It is possible to realize my idea with the help of this class? Did
i something wrong with in my code (please have a look below)?

I would be very grateful for any help.

Thanks in advanced,
Mirko


DOMPrintFilter.hpp:
--------------------


class DOMParserFilter : public DOMLSParserFilter {
public:

  DOMParserFilter(DOMNodeFilter::ShowType whatToShow = DOMNodeFilter::SHOW_ALL);
    ~DOMParserFilter(){};

    virtual FilterAction startElement(DOMElement* node);
    virtual FilterAction acceptNode(DOMNode* node){return DOMParserFilter::FILTER_ACCEPT;};
    virtual DOMNodeFilter::ShowType getWhatToShow() const {return fWhatToShow;};

private:
    DOMNodeFilter::ShowType fWhatToShow;
};


DOMPrintFilter.cpp:
--------------------

DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow)
:fWhatToShow(whatToShow)
{}

DOMParserFilter::FilterAction DOMParserFilter::startElement(DOMElement* node)
{
  // for element whose name is "DATA", skip it
  if (XMLString::compareString(node->getNodeName(), element_data)==0)
    return DOMParserFilter::FILTER_REJECT;
  else
    return DOMParserFilter::FILTER_ACCEPT;
}


DOMPrint.cpp:
---------------

static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S, xercesc::chNull };

xercesc::DOMImplementation *implParser = xercesc::DOMImplementationRegistry::getDOMImplementation(gLS);

xercesc::DOMLSParser* parser = ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0);



DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler, errReporter);
   
DOMParserFilter * pDOMParserFilter = new DOMParserFilter();
parser->setFilter(pDOMParserFilter);
   

    //
    //  Parse the XML file, catching any XML exceptions that might propogate
    //  out of it.
    //
    bool errorsOccured = false;
    DOMDocument *doc = NULL;

    try
    {
      doc = parser->parseURI(gXmlFile);
    }
    catch (const OutOfMemoryException&)
    {
        XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << XERCES_STD_QUALIFIER endl;
        errorsOccured = true;
    }
    catch (const XMLException& e)
    {
        XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n   Message: "
             << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl;
        errorsOccured = true;
    }

    catch (const DOMException& e)
    {
      const unsigned int maxChars = 2047;
      XMLCh errText[maxChars + 1];

      XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" << gXmlFile << "'\n"
           << "DOMException code is:  " << e.code << XERCES_STD_QUALIFIER endl;

      if (DOMImplementation::loadDOMExceptionMsg(e.code, errText, maxChars))
           XERCES_STD_QUALIFIER cerr << "Message is: " << StrX(errText) << XERCES_STD_QUALIFIER endl;

      errorsOccured = true;
    }

    catch (...)
    {
        XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n " << XERCES_STD_QUALIFIER endl;
        errorsOccured = true;
    }




Re: method startElement() from class DOMLSParserFilter

by Alberto Massari :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Mirko,
I think the current implementation of the DOMLSParserFilter doesn't work
nicely with your code, as the rejected nodes are not recycled and the
memory will grow to the same level as before.
Anyhow, you should instead override acceptNode like this:

DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement* node)
{
  // for element whose name is "DATA", skip it
   if (node->getNodeType()==DOMNode::ELEMENT_NODE &&
XMLString::compareString(node->getNodeName(), element_data)==0)
     return DOMParserFilter::FILTER_REJECT;
  else
    return DOMParserFilter::FILTER_ACCEPT;
}

Then, change DOMLSParserImpl::endElement to add a call to
origNode->release() after the call to removeChild().

Alberto


Mirko Braun wrote:

> Hello everybody,
>
> i would like to parse a quite large XML file (about 180 MB).
> I used the DOM interface because i need the tree for further
> processing of the data the xml file contains. Of course there
> is a lot of memory used during parsing the file and i got an
> "Out of memory" exception.
>
> I noticed that a class DOMLSParserFilter comes along wiht Xercesc C++ 3.0.1 (Win32), which makes it possible to filter the Nodes during parsing.
> That is perfect for me because one XML-Element in my large file
> contains most of the data. This XML-Element is called DATA and
> appears serveral time in my XML file.
> So i had the idea to reject this XML-Element from the DOM tree
> during parsing to reduce the used memory by using the method
> startElement() of the DOMLSParserFilter class. After that i would
> use a SAX parser and just get all XML-Elements DATA with their values.
> But it does not work.
> I integregated my code into the DOMPrint example which comes along
> with Xercesc C++ 3.0.1. The following error message occurred:
>
> DOM Error during parsing: 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
> DOMException code is:  3
> Message is: attempt is made to insert a node where it is not permitted
>
>
> Did i misunderstand the functionality of the DOMLSParserFilter class
> and its method startElement?
> It is possible to realize my idea with the help of this class? Did
> i something wrong with in my code (please have a look below)?
>
> I would be very grateful for any help.
>
> Thanks in advanced,
> Mirko
>
>
> DOMPrintFilter.hpp:
> --------------------
>
>
> class DOMParserFilter : public DOMLSParserFilter {
> public:
>
>   DOMParserFilter(DOMNodeFilter::ShowType whatToShow = DOMNodeFilter::SHOW_ALL);
>     ~DOMParserFilter(){};
>
>     virtual FilterAction startElement(DOMElement* node);
>     virtual FilterAction acceptNode(DOMNode* node){return DOMParserFilter::FILTER_ACCEPT;};
>     virtual DOMNodeFilter::ShowType getWhatToShow() const {return fWhatToShow;};
>
> private:
>     DOMNodeFilter::ShowType fWhatToShow;
> };
>
>
> DOMPrintFilter.cpp:
> --------------------
>
> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow)
> :fWhatToShow(whatToShow)
> {}
>
> DOMParserFilter::FilterAction DOMParserFilter::startElement(DOMElement* node)
> {
>   // for element whose name is "DATA", skip it
>   if (XMLString::compareString(node->getNodeName(), element_data)==0)
>     return DOMParserFilter::FILTER_REJECT;
>   else
>     return DOMParserFilter::FILTER_ACCEPT;
> }
>
>
> DOMPrint.cpp:
> ---------------
>
> static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S, xercesc::chNull };
>
> xercesc::DOMImplementation *implParser = xercesc::DOMImplementationRegistry::getDOMImplementation(gLS);
>
> xercesc::DOMLSParser* parser = ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0);
>
>
>
> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler, errReporter);
>    
> DOMParserFilter * pDOMParserFilter = new DOMParserFilter();
> parser->setFilter(pDOMParserFilter);
>    
>
>     //
>     //  Parse the XML file, catching any XML exceptions that might propogate
>     //  out of it.
>     //
>     bool errorsOccured = false;
>     DOMDocument *doc = NULL;
>
>     try
>     {
>       doc = parser->parseURI(gXmlFile);
>     }
>     catch (const OutOfMemoryException&)
>     {
>         XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << XERCES_STD_QUALIFIER endl;
>         errorsOccured = true;
>     }
>     catch (const XMLException& e)
>     {
>         XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n   Message: "
>              << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl;
>         errorsOccured = true;
>     }
>
>     catch (const DOMException& e)
>     {
>       const unsigned int maxChars = 2047;
>       XMLCh errText[maxChars + 1];
>
>       XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" << gXmlFile << "'\n"
>            << "DOMException code is:  " << e.code << XERCES_STD_QUALIFIER endl;
>
>       if (DOMImplementation::loadDOMExceptionMsg(e.code, errText, maxChars))
>            XERCES_STD_QUALIFIER cerr << "Message is: " << StrX(errText) << XERCES_STD_QUALIFIER endl;
>
>       errorsOccured = true;
>     }
>
>     catch (...)
>     {
>         XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n " << XERCES_STD_QUALIFIER endl;
>         errorsOccured = true;
>     }
>
>
>
>
>  


Re: method startElement() from class DOMLSParserFilter

by Mirko Braun-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Hi Alberto,

thank you for you answer. I integrated the changes you
suggested, but the result is still the same:

DOM Error during parsing:
'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
DOMException code is:  3
Message is: attempt is made to insert a node where it is not permitted

Best regards,
Mirko

-------- Original-Nachricht --------
> Datum: Fri, 04 Sep 2009 12:37:10 +0200
> Von: Alberto Massari <amassari@...>
> An: c-users@...
> Betreff: Re: method startElement() from class DOMLSParserFilter

> Hi Mirko,
> I think the current implementation of the DOMLSParserFilter doesn't work
> nicely with your code, as the rejected nodes are not recycled and the
> memory will grow to the same level as before.
> Anyhow, you should instead override acceptNode like this:
>
> DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement*
> node)
> {
>   // for element whose name is "DATA", skip it
>    if (node->getNodeType()==DOMNode::ELEMENT_NODE &&
> XMLString::compareString(node->getNodeName(), element_data)==0)
>      return DOMParserFilter::FILTER_REJECT;
>   else
>     return DOMParserFilter::FILTER_ACCEPT;
> }
>
> Then, change DOMLSParserImpl::endElement to add a call to
> origNode->release() after the call to removeChild().
>
> Alberto
>
>
> Mirko Braun wrote:
> > Hello everybody,
> >
> > i would like to parse a quite large XML file (about 180 MB).
> > I used the DOM interface because i need the tree for further
> > processing of the data the xml file contains. Of course there
> > is a lot of memory used during parsing the file and i got an
> > "Out of memory" exception.
> >
> > I noticed that a class DOMLSParserFilter comes along wiht Xercesc C++
> 3.0.1 (Win32), which makes it possible to filter the Nodes during parsing.
> > That is perfect for me because one XML-Element in my large file
> > contains most of the data. This XML-Element is called DATA and
> > appears serveral time in my XML file.
> > So i had the idea to reject this XML-Element from the DOM tree
> > during parsing to reduce the used memory by using the method
> > startElement() of the DOMLSParserFilter class. After that i would
> > use a SAX parser and just get all XML-Elements DATA with their values.
> > But it does not work.
> > I integregated my code into the DOMPrint example which comes along
> > with Xercesc C++ 3.0.1. The following error message occurred:
> >
> > DOM Error during parsing:
> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
> > DOMException code is:  3
> > Message is: attempt is made to insert a node where it is not permitted
> >
> >
> > Did i misunderstand the functionality of the DOMLSParserFilter class
> > and its method startElement?
> > It is possible to realize my idea with the help of this class? Did
> > i something wrong with in my code (please have a look below)?
> >
> > I would be very grateful for any help.
> >
> > Thanks in advanced,
> > Mirko
> >
> >
> > DOMPrintFilter.hpp:
> > --------------------
> >
> >
> > class DOMParserFilter : public DOMLSParserFilter {
> > public:
> >
> >   DOMParserFilter(DOMNodeFilter::ShowType whatToShow =
> DOMNodeFilter::SHOW_ALL);
> >     ~DOMParserFilter(){};
> >
> >     virtual FilterAction startElement(DOMElement* node);
> >     virtual FilterAction acceptNode(DOMNode* node){return
> DOMParserFilter::FILTER_ACCEPT;};
> >     virtual DOMNodeFilter::ShowType getWhatToShow() const {return
> fWhatToShow;};
> >
> > private:
> >     DOMNodeFilter::ShowType fWhatToShow;
> > };
> >
> >
> > DOMPrintFilter.cpp:
> > --------------------
> >
> > DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow)
> > :fWhatToShow(whatToShow)
> > {}
> >
> > DOMParserFilter::FilterAction DOMParserFilter::startElement(DOMElement*
> node)
> > {
> >   // for element whose name is "DATA", skip it
> >   if (XMLString::compareString(node->getNodeName(), element_data)==0)
> >     return DOMParserFilter::FILTER_REJECT;
> >   else
> >     return DOMParserFilter::FILTER_ACCEPT;
> > }
> >
> >
> > DOMPrint.cpp:
> > ---------------
> >
> > static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S,
> xercesc::chNull };
> >
> > xercesc::DOMImplementation *implParser =
> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS);
> >
> > xercesc::DOMLSParser* parser =
> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0);
> >
> >
> >
> > DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
> > parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler,
> errReporter);
> >    
> > DOMParserFilter * pDOMParserFilter = new DOMParserFilter();
> > parser->setFilter(pDOMParserFilter);
> >    
> >
> >     //
> >     //  Parse the XML file, catching any XML exceptions that might
> propogate
> >     //  out of it.
> >     //
> >     bool errorsOccured = false;
> >     DOMDocument *doc = NULL;
> >
> >     try
> >     {
> >       doc = parser->parseURI(gXmlFile);
> >     }
> >     catch (const OutOfMemoryException&)
> >     {
> >         XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" <<
> XERCES_STD_QUALIFIER endl;
> >         errorsOccured = true;
> >     }
> >     catch (const XMLException& e)
> >     {
> >         XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n
>   Message: "
> >              << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl;
> >         errorsOccured = true;
> >     }
> >
> >     catch (const DOMException& e)
> >     {
> >       const unsigned int maxChars = 2047;
> >       XMLCh errText[maxChars + 1];
> >
> >       XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" <<
> gXmlFile << "'\n"
> >            << "DOMException code is:  " << e.code <<
> XERCES_STD_QUALIFIER endl;
> >
> >       if (DOMImplementation::loadDOMExceptionMsg(e.code, errText,
> maxChars))
> >            XERCES_STD_QUALIFIER cerr << "Message is: " << StrX(errText)
> << XERCES_STD_QUALIFIER endl;
> >
> >       errorsOccured = true;
> >     }
> >
> >     catch (...)
> >     {
> >         XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n
> " << XERCES_STD_QUALIFIER endl;
> >         errorsOccured = true;
> >     }
> >
> >
> >
> >
> >  

RE: method startElement() from class DOMLSParserFilter

by John Lilley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Forgive my ignorance, but could it be that you must reject not only the node you don't want, but all of its children as well?

john

-----Original Message-----
From: Mirko Braun [mailto:mirko.braun@...]
Sent: Friday, September 04, 2009 6:01 AM
To: c-users@...
Subject: Re: method startElement() from class DOMLSParserFilter


Hi Alberto,

thank you for you answer. I integrated the changes you
suggested, but the result is still the same:

DOM Error during parsing:
'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
DOMException code is:  3
Message is: attempt is made to insert a node where it is not permitted

Best regards,
Mirko

-------- Original-Nachricht --------
> Datum: Fri, 04 Sep 2009 12:37:10 +0200
> Von: Alberto Massari <amassari@...>
> An: c-users@...
> Betreff: Re: method startElement() from class DOMLSParserFilter

> Hi Mirko,
> I think the current implementation of the DOMLSParserFilter doesn't work
> nicely with your code, as the rejected nodes are not recycled and the
> memory will grow to the same level as before.
> Anyhow, you should instead override acceptNode like this:
>
> DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement*
> node)
> {
>   // for element whose name is "DATA", skip it
>    if (node->getNodeType()==DOMNode::ELEMENT_NODE &&
> XMLString::compareString(node->getNodeName(), element_data)==0)
>      return DOMParserFilter::FILTER_REJECT;
>   else
>     return DOMParserFilter::FILTER_ACCEPT;
> }
>
> Then, change DOMLSParserImpl::endElement to add a call to
> origNode->release() after the call to removeChild().
>
> Alberto
>
>
> Mirko Braun wrote:
> > Hello everybody,
> >
> > i would like to parse a quite large XML file (about 180 MB).
> > I used the DOM interface because i need the tree for further
> > processing of the data the xml file contains. Of course there
> > is a lot of memory used during parsing the file and i got an
> > "Out of memory" exception.
> >
> > I noticed that a class DOMLSParserFilter comes along wiht Xercesc C++
> 3.0.1 (Win32), which makes it possible to filter the Nodes during parsing.
> > That is perfect for me because one XML-Element in my large file
> > contains most of the data. This XML-Element is called DATA and
> > appears serveral time in my XML file.
> > So i had the idea to reject this XML-Element from the DOM tree
> > during parsing to reduce the used memory by using the method
> > startElement() of the DOMLSParserFilter class. After that i would
> > use a SAX parser and just get all XML-Elements DATA with their values.
> > But it does not work.
> > I integregated my code into the DOMPrint example which comes along
> > with Xercesc C++ 3.0.1. The following error message occurred:
> >
> > DOM Error during parsing:
> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
> > DOMException code is:  3
> > Message is: attempt is made to insert a node where it is not permitted
> >
> >
> > Did i misunderstand the functionality of the DOMLSParserFilter class
> > and its method startElement?
> > It is possible to realize my idea with the help of this class? Did
> > i something wrong with in my code (please have a look below)?
> >
> > I would be very grateful for any help.
> >
> > Thanks in advanced,
> > Mirko
> >
> >
> > DOMPrintFilter.hpp:
> > --------------------
> >
> >
> > class DOMParserFilter : public DOMLSParserFilter {
> > public:
> >
> >   DOMParserFilter(DOMNodeFilter::ShowType whatToShow =
> DOMNodeFilter::SHOW_ALL);
> >     ~DOMParserFilter(){};
> >
> >     virtual FilterAction startElement(DOMElement* node);
> >     virtual FilterAction acceptNode(DOMNode* node){return
> DOMParserFilter::FILTER_ACCEPT;};
> >     virtual DOMNodeFilter::ShowType getWhatToShow() const {return
> fWhatToShow;};
> >
> > private:
> >     DOMNodeFilter::ShowType fWhatToShow;
> > };
> >
> >
> > DOMPrintFilter.cpp:
> > --------------------
> >
> > DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow)
> > :fWhatToShow(whatToShow)
> > {}
> >
> > DOMParserFilter::FilterAction DOMParserFilter::startElement(DOMElement*
> node)
> > {
> >   // for element whose name is "DATA", skip it
> >   if (XMLString::compareString(node->getNodeName(), element_data)==0)
> >     return DOMParserFilter::FILTER_REJECT;
> >   else
> >     return DOMParserFilter::FILTER_ACCEPT;
> > }
> >
> >
> > DOMPrint.cpp:
> > ---------------
> >
> > static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S,
> xercesc::chNull };
> >
> > xercesc::DOMImplementation *implParser =
> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS);
> >
> > xercesc::DOMLSParser* parser =
> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0);
> >
> >
> >
> > DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
> > parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler,
> errReporter);
> >    
> > DOMParserFilter * pDOMParserFilter = new DOMParserFilter();
> > parser->setFilter(pDOMParserFilter);
> >    
> >
> >     //
> >     //  Parse the XML file, catching any XML exceptions that might
> propogate
> >     //  out of it.
> >     //
> >     bool errorsOccured = false;
> >     DOMDocument *doc = NULL;
> >
> >     try
> >     {
> >       doc = parser->parseURI(gXmlFile);
> >     }
> >     catch (const OutOfMemoryException&)
> >     {
> >         XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" <<
> XERCES_STD_QUALIFIER endl;
> >         errorsOccured = true;
> >     }
> >     catch (const XMLException& e)
> >     {
> >         XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n
>   Message: "
> >              << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl;
> >         errorsOccured = true;
> >     }
> >
> >     catch (const DOMException& e)
> >     {
> >       const unsigned int maxChars = 2047;
> >       XMLCh errText[maxChars + 1];
> >
> >       XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" <<
> gXmlFile << "'\n"
> >            << "DOMException code is:  " << e.code <<
> XERCES_STD_QUALIFIER endl;
> >
> >       if (DOMImplementation::loadDOMExceptionMsg(e.code, errText,
> maxChars))
> >            XERCES_STD_QUALIFIER cerr << "Message is: " << StrX(errText)
> << XERCES_STD_QUALIFIER endl;
> >
> >       errorsOccured = true;
> >     }
> >
> >     catch (...)
> >     {
> >         XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n
> " << XERCES_STD_QUALIFIER endl;
> >         errorsOccured = true;
> >     }
> >
> >
> >
> >
> >  

Re: method startElement() from class DOMLSParserFilter

by Alberto Massari :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Mirko,
are you sure that your root node isn't one of those DATA elements? In
this case the document node would see more than one root element.

Alberto

Mirko Braun wrote:

> Hi Alberto,
>
> thank you for you answer. I integrated the changes you
> suggested, but the result is still the same:
>
> DOM Error during parsing:
> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
> DOMException code is:  3
> Message is: attempt is made to insert a node where it is not permitted
>
> Best regards,
> Mirko
>
> -------- Original-Nachricht --------
>  
>> Datum: Fri, 04 Sep 2009 12:37:10 +0200
>> Von: Alberto Massari <amassari@...>
>> An: c-users@...
>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>    
>
>  
>> Hi Mirko,
>> I think the current implementation of the DOMLSParserFilter doesn't work
>> nicely with your code, as the rejected nodes are not recycled and the
>> memory will grow to the same level as before.
>> Anyhow, you should instead override acceptNode like this:
>>
>> DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement*
>> node)
>> {
>>   // for element whose name is "DATA", skip it
>>    if (node->getNodeType()==DOMNode::ELEMENT_NODE &&
>> XMLString::compareString(node->getNodeName(), element_data)==0)
>>      return DOMParserFilter::FILTER_REJECT;
>>   else
>>     return DOMParserFilter::FILTER_ACCEPT;
>> }
>>
>> Then, change DOMLSParserImpl::endElement to add a call to
>> origNode->release() after the call to removeChild().
>>
>> Alberto
>>
>>
>> Mirko Braun wrote:
>>    
>>> Hello everybody,
>>>
>>> i would like to parse a quite large XML file (about 180 MB).
>>> I used the DOM interface because i need the tree for further
>>> processing of the data the xml file contains. Of course there
>>> is a lot of memory used during parsing the file and i got an
>>> "Out of memory" exception.
>>>
>>> I noticed that a class DOMLSParserFilter comes along wiht Xercesc C++
>>>      
>> 3.0.1 (Win32), which makes it possible to filter the Nodes during parsing.
>>    
>>> That is perfect for me because one XML-Element in my large file
>>> contains most of the data. This XML-Element is called DATA and
>>> appears serveral time in my XML file.
>>> So i had the idea to reject this XML-Element from the DOM tree
>>> during parsing to reduce the used memory by using the method
>>> startElement() of the DOMLSParserFilter class. After that i would
>>> use a SAX parser and just get all XML-Elements DATA with their values.
>>> But it does not work.
>>> I integregated my code into the DOMPrint example which comes along
>>> with Xercesc C++ 3.0.1. The following error message occurred:
>>>
>>> DOM Error during parsing:
>>>      
>> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
>>    
>>> DOMException code is:  3
>>> Message is: attempt is made to insert a node where it is not permitted
>>>
>>>
>>> Did i misunderstand the functionality of the DOMLSParserFilter class
>>> and its method startElement?
>>> It is possible to realize my idea with the help of this class? Did
>>> i something wrong with in my code (please have a look below)?
>>>
>>> I would be very grateful for any help.
>>>
>>> Thanks in advanced,
>>> Mirko
>>>
>>>
>>> DOMPrintFilter.hpp:
>>> --------------------
>>>
>>>
>>> class DOMParserFilter : public DOMLSParserFilter {
>>> public:
>>>
>>>   DOMParserFilter(DOMNodeFilter::ShowType whatToShow =
>>>      
>> DOMNodeFilter::SHOW_ALL);
>>    
>>>     ~DOMParserFilter(){};
>>>
>>>     virtual FilterAction startElement(DOMElement* node);
>>>     virtual FilterAction acceptNode(DOMNode* node){return
>>>      
>> DOMParserFilter::FILTER_ACCEPT;};
>>    
>>>     virtual DOMNodeFilter::ShowType getWhatToShow() const {return
>>>      
>> fWhatToShow;};
>>    
>>> private:
>>>     DOMNodeFilter::ShowType fWhatToShow;
>>> };
>>>
>>>
>>> DOMPrintFilter.cpp:
>>> --------------------
>>>
>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow)
>>> :fWhatToShow(whatToShow)
>>> {}
>>>
>>> DOMParserFilter::FilterAction DOMParserFilter::startElement(DOMElement*
>>>      
>> node)
>>    
>>> {
>>>   // for element whose name is "DATA", skip it
>>>   if (XMLString::compareString(node->getNodeName(), element_data)==0)
>>>     return DOMParserFilter::FILTER_REJECT;
>>>   else
>>>     return DOMParserFilter::FILTER_ACCEPT;
>>> }
>>>
>>>
>>> DOMPrint.cpp:
>>> ---------------
>>>
>>> static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S,
>>>      
>> xercesc::chNull };
>>    
>>> xercesc::DOMImplementation *implParser =
>>>      
>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS);
>>    
>>> xercesc::DOMLSParser* parser =
>>>      
>> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0);
>>    
>>>
>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
>>> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler,
>>>      
>> errReporter);
>>    
>>>    
>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter();
>>> parser->setFilter(pDOMParserFilter);
>>>    
>>>
>>>     //
>>>     //  Parse the XML file, catching any XML exceptions that might
>>>      
>> propogate
>>    
>>>     //  out of it.
>>>     //
>>>     bool errorsOccured = false;
>>>     DOMDocument *doc = NULL;
>>>
>>>     try
>>>     {
>>>       doc = parser->parseURI(gXmlFile);
>>>     }
>>>     catch (const OutOfMemoryException&)
>>>     {
>>>         XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" <<
>>>      
>> XERCES_STD_QUALIFIER endl;
>>    
>>>         errorsOccured = true;
>>>     }
>>>     catch (const XMLException& e)
>>>     {
>>>         XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n
>>>      
>>   Message: "
>>    
>>>              << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl;
>>>         errorsOccured = true;
>>>     }
>>>
>>>     catch (const DOMException& e)
>>>     {
>>>       const unsigned int maxChars = 2047;
>>>       XMLCh errText[maxChars + 1];
>>>
>>>       XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" <<
>>>      
>> gXmlFile << "'\n"
>>    
>>>            << "DOMException code is:  " << e.code <<
>>>      
>> XERCES_STD_QUALIFIER endl;
>>    
>>>       if (DOMImplementation::loadDOMExceptionMsg(e.code, errText,
>>>      
>> maxChars))
>>    
>>>            XERCES_STD_QUALIFIER cerr << "Message is: " << StrX(errText)
>>>      
>> << XERCES_STD_QUALIFIER endl;
>>    
>>>       errorsOccured = true;
>>>     }
>>>
>>>     catch (...)
>>>     {
>>>         XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n
>>>      
>> " << XERCES_STD_QUALIFIER endl;
>>    
>>>         errorsOccured = true;
>>>     }
>>>
>>>
>>>
>>>
>>>  
>>>      
>
>  


Re: RE: method startElement() from class DOMLSParserFilter

by Mirko Braun-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi John,

as far as i understand the explanation for the method startElement() in the
API reference there are no childrens. "The element node passed to startElement for filtering will include all of the attributes, but none of the children nodes." As a consequence removing of children must be done
by the parser internally. Is this correct?

Best regards
Mirko


-------- Original-Nachricht --------
> Datum: Fri, 4 Sep 2009 08:11:14 -0400
> Von: John Lilley <jlilley@...>
> An: "c-users@..." <c-users@...>
> Betreff: RE: method startElement() from class DOMLSParserFilter

> Forgive my ignorance, but could it be that you must reject not only the
> node you don't want, but all of its children as well?
>
> john
>
> -----Original Message-----
> From: Mirko Braun [mailto:mirko.braun@...]
> Sent: Friday, September 04, 2009 6:01 AM
> To: c-users@...
> Subject: Re: method startElement() from class DOMLSParserFilter
>
>
> Hi Alberto,
>
> thank you for you answer. I integrated the changes you
> suggested, but the result is still the same:
>
> DOM Error during parsing:
> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
> DOMException code is:  3
> Message is: attempt is made to insert a node where it is not permitted
>
> Best regards,
> Mirko
>
> -------- Original-Nachricht --------
> > Datum: Fri, 04 Sep 2009 12:37:10 +0200
> > Von: Alberto Massari <amassari@...>
> > An: c-users@...
> > Betreff: Re: method startElement() from class DOMLSParserFilter
>
> > Hi Mirko,
> > I think the current implementation of the DOMLSParserFilter doesn't work
> > nicely with your code, as the rejected nodes are not recycled and the
> > memory will grow to the same level as before.
> > Anyhow, you should instead override acceptNode like this:
> >
> > DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement*
> > node)
> > {
> >   // for element whose name is "DATA", skip it
> >    if (node->getNodeType()==DOMNode::ELEMENT_NODE &&
> > XMLString::compareString(node->getNodeName(), element_data)==0)
> >      return DOMParserFilter::FILTER_REJECT;
> >   else
> >     return DOMParserFilter::FILTER_ACCEPT;
> > }
> >
> > Then, change DOMLSParserImpl::endElement to add a call to
> > origNode->release() after the call to removeChild().
> >
> > Alberto
> >
> >
> > Mirko Braun wrote:
> > > Hello everybody,
> > >
> > > i would like to parse a quite large XML file (about 180 MB).
> > > I used the DOM interface because i need the tree for further
> > > processing of the data the xml file contains. Of course there
> > > is a lot of memory used during parsing the file and i got an
> > > "Out of memory" exception.
> > >
> > > I noticed that a class DOMLSParserFilter comes along wiht Xercesc C++
> > 3.0.1 (Win32), which makes it possible to filter the Nodes during
> parsing.
> > > That is perfect for me because one XML-Element in my large file
> > > contains most of the data. This XML-Element is called DATA and
> > > appears serveral time in my XML file.
> > > So i had the idea to reject this XML-Element from the DOM tree
> > > during parsing to reduce the used memory by using the method
> > > startElement() of the DOMLSParserFilter class. After that i would
> > > use a SAX parser and just get all XML-Elements DATA with their values.
> > > But it does not work.
> > > I integregated my code into the DOMPrint example which comes along
> > > with Xercesc C++ 3.0.1. The following error message occurred:
> > >
> > > DOM Error during parsing:
> >
> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
> > > DOMException code is:  3
> > > Message is: attempt is made to insert a node where it is not permitted
> > >
> > >
> > > Did i misunderstand the functionality of the DOMLSParserFilter class
> > > and its method startElement?
> > > It is possible to realize my idea with the help of this class? Did
> > > i something wrong with in my code (please have a look below)?
> > >
> > > I would be very grateful for any help.
> > >
> > > Thanks in advanced,
> > > Mirko
> > >
> > >
> > > DOMPrintFilter.hpp:
> > > --------------------
> > >
> > >
> > > class DOMParserFilter : public DOMLSParserFilter {
> > > public:
> > >
> > >   DOMParserFilter(DOMNodeFilter::ShowType whatToShow =
> > DOMNodeFilter::SHOW_ALL);
> > >     ~DOMParserFilter(){};
> > >
> > >     virtual FilterAction startElement(DOMElement* node);
> > >     virtual FilterAction acceptNode(DOMNode* node){return
> > DOMParserFilter::FILTER_ACCEPT;};
> > >     virtual DOMNodeFilter::ShowType getWhatToShow() const {return
> > fWhatToShow;};
> > >
> > > private:
> > >     DOMNodeFilter::ShowType fWhatToShow;
> > > };
> > >
> > >
> > > DOMPrintFilter.cpp:
> > > --------------------
> > >
> > > DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow)
> > > :fWhatToShow(whatToShow)
> > > {}
> > >
> > > DOMParserFilter::FilterAction
> DOMParserFilter::startElement(DOMElement*
> > node)
> > > {
> > >   // for element whose name is "DATA", skip it
> > >   if (XMLString::compareString(node->getNodeName(), element_data)==0)
> > >     return DOMParserFilter::FILTER_REJECT;
> > >   else
> > >     return DOMParserFilter::FILTER_ACCEPT;
> > > }
> > >
> > >
> > > DOMPrint.cpp:
> > > ---------------
> > >
> > > static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S,
> > xercesc::chNull };
> > >
> > > xercesc::DOMImplementation *implParser =
> > xercesc::DOMImplementationRegistry::getDOMImplementation(gLS);
> > >
> > > xercesc::DOMLSParser* parser =
> >
> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0);
> > >
> > >
> > >
> > > DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
> > >
> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler,
> > errReporter);
> > >    
> > > DOMParserFilter * pDOMParserFilter = new DOMParserFilter();
> > > parser->setFilter(pDOMParserFilter);
> > >    
> > >
> > >     //
> > >     //  Parse the XML file, catching any XML exceptions that might
> > propogate
> > >     //  out of it.
> > >     //
> > >     bool errorsOccured = false;
> > >     DOMDocument *doc = NULL;
> > >
> > >     try
> > >     {
> > >       doc = parser->parseURI(gXmlFile);
> > >     }
> > >     catch (const OutOfMemoryException&)
> > >     {
> > >         XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" <<
> > XERCES_STD_QUALIFIER endl;
> > >         errorsOccured = true;
> > >     }
> > >     catch (const XMLException& e)
> > >     {
> > >         XERCES_STD_QUALIFIER cerr << "An error occurred during
> parsing\n
> >   Message: "
> > >              << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl;
> > >         errorsOccured = true;
> > >     }
> > >
> > >     catch (const DOMException& e)
> > >     {
> > >       const unsigned int maxChars = 2047;
> > >       XMLCh errText[maxChars + 1];
> > >
> > >       XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" <<
> > gXmlFile << "'\n"
> > >            << "DOMException code is:  " << e.code <<
> > XERCES_STD_QUALIFIER endl;
> > >
> > >       if (DOMImplementation::loadDOMExceptionMsg(e.code, errText,
> > maxChars))
> > >            XERCES_STD_QUALIFIER cerr << "Message is: " <<
> StrX(errText)
> > << XERCES_STD_QUALIFIER endl;
> > >
> > >       errorsOccured = true;
> > >     }
> > >
> > >     catch (...)
> > >     {
> > >         XERCES_STD_QUALIFIER cerr << "An error occurred during
> parsing\n
> > " << XERCES_STD_QUALIFIER endl;
> > >         errorsOccured = true;
> > >     }
> > >
> > >
> > >
> > >
> > >  

Re: method startElement() from class DOMLSParserFilter

by Mirko Braun-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Alberto,

yes i'm sure that DATA is not a root node. I debugged a little bit.
The exception occurs after the sixth time this DATA node was found.

Mirko

-------- Original-Nachricht --------
> Datum: Fri, 04 Sep 2009 14:21:15 +0200
> Von: Alberto Massari <amassari@...>
> An: c-users@...
> Betreff: Re: method startElement() from class DOMLSParserFilter

> Hi Mirko,
> are you sure that your root node isn't one of those DATA elements? In
> this case the document node would see more than one root element.
>
> Alberto
>
> Mirko Braun wrote:
> > Hi Alberto,
> >
> > thank you for you answer. I integrated the changes you
> > suggested, but the result is still the same:
> >
> > DOM Error during parsing:
> >
> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
> > DOMException code is:  3
> > Message is: attempt is made to insert a node where it is not permitted
> >
> > Best regards,
> > Mirko
> >
> > -------- Original-Nachricht --------
> >  
> >> Datum: Fri, 04 Sep 2009 12:37:10 +0200
> >> Von: Alberto Massari <amassari@...>
> >> An: c-users@...
> >> Betreff: Re: method startElement() from class DOMLSParserFilter
> >>    
> >
> >  
> >> Hi Mirko,
> >> I think the current implementation of the DOMLSParserFilter doesn't
> work
> >> nicely with your code, as the rejected nodes are not recycled and the
> >> memory will grow to the same level as before.
> >> Anyhow, you should instead override acceptNode like this:
> >>
> >> DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement*
> >> node)
> >> {
> >>   // for element whose name is "DATA", skip it
> >>    if (node->getNodeType()==DOMNode::ELEMENT_NODE &&
> >> XMLString::compareString(node->getNodeName(), element_data)==0)
> >>      return DOMParserFilter::FILTER_REJECT;
> >>   else
> >>     return DOMParserFilter::FILTER_ACCEPT;
> >> }
> >>
> >> Then, change DOMLSParserImpl::endElement to add a call to
> >> origNode->release() after the call to removeChild().
> >>
> >> Alberto
> >>
> >>
> >> Mirko Braun wrote:
> >>    
> >>> Hello everybody,
> >>>
> >>> i would like to parse a quite large XML file (about 180 MB).
> >>> I used the DOM interface because i need the tree for further
> >>> processing of the data the xml file contains. Of course there
> >>> is a lot of memory used during parsing the file and i got an
> >>> "Out of memory" exception.
> >>>
> >>> I noticed that a class DOMLSParserFilter comes along wiht Xercesc C++
> >>>      
> >> 3.0.1 (Win32), which makes it possible to filter the Nodes during
> parsing.
> >>    
> >>> That is perfect for me because one XML-Element in my large file
> >>> contains most of the data. This XML-Element is called DATA and
> >>> appears serveral time in my XML file.
> >>> So i had the idea to reject this XML-Element from the DOM tree
> >>> during parsing to reduce the used memory by using the method
> >>> startElement() of the DOMLSParserFilter class. After that i would
> >>> use a SAX parser and just get all XML-Elements DATA with their values.
> >>> But it does not work.
> >>> I integregated my code into the DOMPrint example which comes along
> >>> with Xercesc C++ 3.0.1. The following error message occurred:
> >>>
> >>> DOM Error during parsing:
> >>>      
> >>
> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
> >>    
> >>> DOMException code is:  3
> >>> Message is: attempt is made to insert a node where it is not permitted
> >>>
> >>>
> >>> Did i misunderstand the functionality of the DOMLSParserFilter class
> >>> and its method startElement?
> >>> It is possible to realize my idea with the help of this class? Did
> >>> i something wrong with in my code (please have a look below)?
> >>>
> >>> I would be very grateful for any help.
> >>>
> >>> Thanks in advanced,
> >>> Mirko
> >>>
> >>>
> >>> DOMPrintFilter.hpp:
> >>> --------------------
> >>>
> >>>
> >>> class DOMParserFilter : public DOMLSParserFilter {
> >>> public:
> >>>
> >>>   DOMParserFilter(DOMNodeFilter::ShowType whatToShow =
> >>>      
> >> DOMNodeFilter::SHOW_ALL);
> >>    
> >>>     ~DOMParserFilter(){};
> >>>
> >>>     virtual FilterAction startElement(DOMElement* node);
> >>>     virtual FilterAction acceptNode(DOMNode* node){return
> >>>      
> >> DOMParserFilter::FILTER_ACCEPT;};
> >>    
> >>>     virtual DOMNodeFilter::ShowType getWhatToShow() const {return
> >>>      
> >> fWhatToShow;};
> >>    
> >>> private:
> >>>     DOMNodeFilter::ShowType fWhatToShow;
> >>> };
> >>>
> >>>
> >>> DOMPrintFilter.cpp:
> >>> --------------------
> >>>
> >>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow)
> >>> :fWhatToShow(whatToShow)
> >>> {}
> >>>
> >>> DOMParserFilter::FilterAction
> DOMParserFilter::startElement(DOMElement*
> >>>      
> >> node)
> >>    
> >>> {
> >>>   // for element whose name is "DATA", skip it
> >>>   if (XMLString::compareString(node->getNodeName(), element_data)==0)
> >>>     return DOMParserFilter::FILTER_REJECT;
> >>>   else
> >>>     return DOMParserFilter::FILTER_ACCEPT;
> >>> }
> >>>
> >>>
> >>> DOMPrint.cpp:
> >>> ---------------
> >>>
> >>> static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S,
> >>>      
> >> xercesc::chNull };
> >>    
> >>> xercesc::DOMImplementation *implParser =
> >>>      
> >> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS);
> >>    
> >>> xercesc::DOMLSParser* parser =
> >>>      
> >>
> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0);
> >>    
> >>>
> >>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
> >>>
> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler,
> >>>      
> >> errReporter);
> >>    
> >>>    
> >>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter();
> >>> parser->setFilter(pDOMParserFilter);
> >>>    
> >>>
> >>>     //
> >>>     //  Parse the XML file, catching any XML exceptions that might
> >>>      
> >> propogate
> >>    
> >>>     //  out of it.
> >>>     //
> >>>     bool errorsOccured = false;
> >>>     DOMDocument *doc = NULL;
> >>>
> >>>     try
> >>>     {
> >>>       doc = parser->parseURI(gXmlFile);
> >>>     }
> >>>     catch (const OutOfMemoryException&)
> >>>     {
> >>>         XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" <<
> >>>      
> >> XERCES_STD_QUALIFIER endl;
> >>    
> >>>         errorsOccured = true;
> >>>     }
> >>>     catch (const XMLException& e)
> >>>     {
> >>>         XERCES_STD_QUALIFIER cerr << "An error occurred during
> parsing\n
> >>>      
> >>   Message: "
> >>    
> >>>              << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl;
> >>>         errorsOccured = true;
> >>>     }
> >>>
> >>>     catch (const DOMException& e)
> >>>     {
> >>>       const unsigned int maxChars = 2047;
> >>>       XMLCh errText[maxChars + 1];
> >>>
> >>>       XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" <<
> >>>      
> >> gXmlFile << "'\n"
> >>    
> >>>            << "DOMException code is:  " << e.code <<
> >>>      
> >> XERCES_STD_QUALIFIER endl;
> >>    
> >>>       if (DOMImplementation::loadDOMExceptionMsg(e.code, errText,
> >>>      
> >> maxChars))
> >>    
> >>>            XERCES_STD_QUALIFIER cerr << "Message is: " <<
> StrX(errText)
> >>>      
> >> << XERCES_STD_QUALIFIER endl;
> >>    
> >>>       errorsOccured = true;
> >>>     }
> >>>
> >>>     catch (...)
> >>>     {
> >>>         XERCES_STD_QUALIFIER cerr << "An error occurred during
> parsing\n
> >>>      
> >> " << XERCES_STD_QUALIFIER endl;
> >>    
> >>>         errorsOccured = true;
> >>>     }
> >>>
> >>>
> >>>
> >>>
> >>>  
> >>>      
> >
> >  

Re: method startElement() from class DOMLSParserFilter

by Alberto Massari :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Mirko,
are you still using startElement()? That API would mess with the current
parent, so it would break the parsing at a certain point.

Alberto

Mirko Braun wrote:

> Hi Alberto,
>
> yes i'm sure that DATA is not a root node. I debugged a little bit.
> The exception occurs after the sixth time this DATA node was found.
>
> Mirko
>
> -------- Original-Nachricht --------
>  
>> Datum: Fri, 04 Sep 2009 14:21:15 +0200
>> Von: Alberto Massari <amassari@...>
>> An: c-users@...
>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>    
>
>  
>> Hi Mirko,
>> are you sure that your root node isn't one of those DATA elements? In
>> this case the document node would see more than one root element.
>>
>> Alberto
>>
>> Mirko Braun wrote:
>>    
>>> Hi Alberto,
>>>
>>> thank you for you answer. I integrated the changes you
>>> suggested, but the result is still the same:
>>>
>>> DOM Error during parsing:
>>>
>>>      
>> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
>>    
>>> DOMException code is:  3
>>> Message is: attempt is made to insert a node where it is not permitted
>>>
>>> Best regards,
>>> Mirko
>>>
>>> -------- Original-Nachricht --------
>>>  
>>>      
>>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200
>>>> Von: Alberto Massari <amassari@...>
>>>> An: c-users@...
>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>>>    
>>>>        
>>>  
>>>      
>>>> Hi Mirko,
>>>> I think the current implementation of the DOMLSParserFilter doesn't
>>>>        
>> work
>>    
>>>> nicely with your code, as the rejected nodes are not recycled and the
>>>> memory will grow to the same level as before.
>>>> Anyhow, you should instead override acceptNode like this:
>>>>
>>>> DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement*
>>>> node)
>>>> {
>>>>   // for element whose name is "DATA", skip it
>>>>    if (node->getNodeType()==DOMNode::ELEMENT_NODE &&
>>>> XMLString::compareString(node->getNodeName(), element_data)==0)
>>>>      return DOMParserFilter::FILTER_REJECT;
>>>>   else
>>>>     return DOMParserFilter::FILTER_ACCEPT;
>>>> }
>>>>
>>>> Then, change DOMLSParserImpl::endElement to add a call to
>>>> origNode->release() after the call to removeChild().
>>>>
>>>> Alberto
>>>>
>>>>
>>>> Mirko Braun wrote:
>>>>    
>>>>        
>>>>> Hello everybody,
>>>>>
>>>>> i would like to parse a quite large XML file (about 180 MB).
>>>>> I used the DOM interface because i need the tree for further
>>>>> processing of the data the xml file contains. Of course there
>>>>> is a lot of memory used during parsing the file and i got an
>>>>> "Out of memory" exception.
>>>>>
>>>>> I noticed that a class DOMLSParserFilter comes along wiht Xercesc C++
>>>>>      
>>>>>          
>>>> 3.0.1 (Win32), which makes it possible to filter the Nodes during
>>>>        
>> parsing.
>>    
>>>>    
>>>>        
>>>>> That is perfect for me because one XML-Element in my large file
>>>>> contains most of the data. This XML-Element is called DATA and
>>>>> appears serveral time in my XML file.
>>>>> So i had the idea to reject this XML-Element from the DOM tree
>>>>> during parsing to reduce the used memory by using the method
>>>>> startElement() of the DOMLSParserFilter class. After that i would
>>>>> use a SAX parser and just get all XML-Elements DATA with their values.
>>>>> But it does not work.
>>>>> I integregated my code into the DOMPrint example which comes along
>>>>> with Xercesc C++ 3.0.1. The following error message occurred:
>>>>>
>>>>> DOM Error during parsing:
>>>>>      
>>>>>          
>> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
>>    
>>>>    
>>>>        
>>>>> DOMException code is:  3
>>>>> Message is: attempt is made to insert a node where it is not permitted
>>>>>
>>>>>
>>>>> Did i misunderstand the functionality of the DOMLSParserFilter class
>>>>> and its method startElement?
>>>>> It is possible to realize my idea with the help of this class? Did
>>>>> i something wrong with in my code (please have a look below)?
>>>>>
>>>>> I would be very grateful for any help.
>>>>>
>>>>> Thanks in advanced,
>>>>> Mirko
>>>>>
>>>>>
>>>>> DOMPrintFilter.hpp:
>>>>> --------------------
>>>>>
>>>>>
>>>>> class DOMParserFilter : public DOMLSParserFilter {
>>>>> public:
>>>>>
>>>>>   DOMParserFilter(DOMNodeFilter::ShowType whatToShow =
>>>>>      
>>>>>          
>>>> DOMNodeFilter::SHOW_ALL);
>>>>    
>>>>        
>>>>>     ~DOMParserFilter(){};
>>>>>
>>>>>     virtual FilterAction startElement(DOMElement* node);
>>>>>     virtual FilterAction acceptNode(DOMNode* node){return
>>>>>      
>>>>>          
>>>> DOMParserFilter::FILTER_ACCEPT;};
>>>>    
>>>>        
>>>>>     virtual DOMNodeFilter::ShowType getWhatToShow() const {return
>>>>>      
>>>>>          
>>>> fWhatToShow;};
>>>>    
>>>>        
>>>>> private:
>>>>>     DOMNodeFilter::ShowType fWhatToShow;
>>>>> };
>>>>>
>>>>>
>>>>> DOMPrintFilter.cpp:
>>>>> --------------------
>>>>>
>>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow)
>>>>> :fWhatToShow(whatToShow)
>>>>> {}
>>>>>
>>>>> DOMParserFilter::FilterAction
>>>>>          
>> DOMParserFilter::startElement(DOMElement*
>>    
>>>>>      
>>>>>          
>>>> node)
>>>>    
>>>>        
>>>>> {
>>>>>   // for element whose name is "DATA", skip it
>>>>>   if (XMLString::compareString(node->getNodeName(), element_data)==0)
>>>>>     return DOMParserFilter::FILTER_REJECT;
>>>>>   else
>>>>>     return DOMParserFilter::FILTER_ACCEPT;
>>>>> }
>>>>>
>>>>>
>>>>> DOMPrint.cpp:
>>>>> ---------------
>>>>>
>>>>> static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S,
>>>>>      
>>>>>          
>>>> xercesc::chNull };
>>>>    
>>>>        
>>>>> xercesc::DOMImplementation *implParser =
>>>>>      
>>>>>          
>>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS);
>>>>    
>>>>        
>>>>> xercesc::DOMLSParser* parser =
>>>>>      
>>>>>          
>> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0);
>>    
>>>>    
>>>>        
>>>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
>>>>>
>>>>>          
>> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler,
>>    
>>>>>      
>>>>>          
>>>> errReporter);
>>>>    
>>>>        
>>>>>    
>>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter();
>>>>> parser->setFilter(pDOMParserFilter);
>>>>>    
>>>>>
>>>>>     //
>>>>>     //  Parse the XML file, catching any XML exceptions that might
>>>>>      
>>>>>          
>>>> propogate
>>>>    
>>>>        
>>>>>     //  out of it.
>>>>>     //
>>>>>     bool errorsOccured = false;
>>>>>     DOMDocument *doc = NULL;
>>>>>
>>>>>     try
>>>>>     {
>>>>>       doc = parser->parseURI(gXmlFile);
>>>>>     }
>>>>>     catch (const OutOfMemoryException&)
>>>>>     {
>>>>>         XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" <<
>>>>>      
>>>>>          
>>>> XERCES_STD_QUALIFIER endl;
>>>>    
>>>>        
>>>>>         errorsOccured = true;
>>>>>     }
>>>>>     catch (const XMLException& e)
>>>>>     {
>>>>>         XERCES_STD_QUALIFIER cerr << "An error occurred during
>>>>>          
>> parsing\n
>>    
>>>>>      
>>>>>          
>>>>   Message: "
>>>>    
>>>>        
>>>>>              << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl;
>>>>>         errorsOccured = true;
>>>>>     }
>>>>>
>>>>>     catch (const DOMException& e)
>>>>>     {
>>>>>       const unsigned int maxChars = 2047;
>>>>>       XMLCh errText[maxChars + 1];
>>>>>
>>>>>       XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" <<
>>>>>      
>>>>>          
>>>> gXmlFile << "'\n"
>>>>    
>>>>        
>>>>>            << "DOMException code is:  " << e.code <<
>>>>>      
>>>>>          
>>>> XERCES_STD_QUALIFIER endl;
>>>>    
>>>>        
>>>>>       if (DOMImplementation::loadDOMExceptionMsg(e.code, errText,
>>>>>      
>>>>>          
>>>> maxChars))
>>>>    
>>>>        
>>>>>            XERCES_STD_QUALIFIER cerr << "Message is: " <<
>>>>>          
>> StrX(errText)
>>    
>>>>>      
>>>>>          
>>>> << XERCES_STD_QUALIFIER endl;
>>>>    
>>>>        
>>>>>       errorsOccured = true;
>>>>>     }
>>>>>
>>>>>     catch (...)
>>>>>     {
>>>>>         XERCES_STD_QUALIFIER cerr << "An error occurred during
>>>>>          
>> parsing\n
>>    
>>>>>      
>>>>>          
>>>> " << XERCES_STD_QUALIFIER endl;
>>>>    
>>>>        
>>>>>         errorsOccured = true;
>>>>>     }
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  
>>>>>      
>>>>>          
>>>  
>>>      
>
>  


RE: RE: method startElement() from class DOMLSParserFilter

by John Lilley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm afraid I don't know the answer.
john

-----Original Message-----
From: Mirko Braun [mailto:mirko.braun@...]
Sent: Friday, September 04, 2009 7:18 AM
To: c-users@...
Subject: Re: RE: method startElement() from class DOMLSParserFilter

Hi John,

as far as i understand the explanation for the method startElement() in the
API reference there are no childrens. "The element node passed to startElement for filtering will include all of the attributes, but none of the children nodes." As a consequence removing of children must be done
by the parser internally. Is this correct?

Best regards
Mirko


-------- Original-Nachricht --------
> Datum: Fri, 4 Sep 2009 08:11:14 -0400
> Von: John Lilley <jlilley@...>
> An: "c-users@..." <c-users@...>
> Betreff: RE: method startElement() from class DOMLSParserFilter

> Forgive my ignorance, but could it be that you must reject not only the
> node you don't want, but all of its children as well?
>
> john
>
> -----Original Message-----
> From: Mirko Braun [mailto:mirko.braun@...]
> Sent: Friday, September 04, 2009 6:01 AM
> To: c-users@...
> Subject: Re: method startElement() from class DOMLSParserFilter
>
>
> Hi Alberto,
>
> thank you for you answer. I integrated the changes you
> suggested, but the result is still the same:
>
> DOM Error during parsing:
> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
> DOMException code is:  3
> Message is: attempt is made to insert a node where it is not permitted
>
> Best regards,
> Mirko
>
> -------- Original-Nachricht --------
> > Datum: Fri, 04 Sep 2009 12:37:10 +0200
> > Von: Alberto Massari <amassari@...>
> > An: c-users@...
> > Betreff: Re: method startElement() from class DOMLSParserFilter
>
> > Hi Mirko,
> > I think the current implementation of the DOMLSParserFilter doesn't work
> > nicely with your code, as the rejected nodes are not recycled and the
> > memory will grow to the same level as before.
> > Anyhow, you should instead override acceptNode like this:
> >
> > DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement*
> > node)
> > {
> >   // for element whose name is "DATA", skip it
> >    if (node->getNodeType()==DOMNode::ELEMENT_NODE &&
> > XMLString::compareString(node->getNodeName(), element_data)==0)
> >      return DOMParserFilter::FILTER_REJECT;
> >   else
> >     return DOMParserFilter::FILTER_ACCEPT;
> > }
> >
> > Then, change DOMLSParserImpl::endElement to add a call to
> > origNode->release() after the call to removeChild().
> >
> > Alberto
> >
> >
> > Mirko Braun wrote:
> > > Hello everybody,
> > >
> > > i would like to parse a quite large XML file (about 180 MB).
> > > I used the DOM interface because i need the tree for further
> > > processing of the data the xml file contains. Of course there
> > > is a lot of memory used during parsing the file and i got an
> > > "Out of memory" exception.
> > >
> > > I noticed that a class DOMLSParserFilter comes along wiht Xercesc C++
> > 3.0.1 (Win32), which makes it possible to filter the Nodes during
> parsing.
> > > That is perfect for me because one XML-Element in my large file
> > > contains most of the data. This XML-Element is called DATA and
> > > appears serveral time in my XML file.
> > > So i had the idea to reject this XML-Element from the DOM tree
> > > during parsing to reduce the used memory by using the method
> > > startElement() of the DOMLSParserFilter class. After that i would
> > > use a SAX parser and just get all XML-Elements DATA with their values.
> > > But it does not work.
> > > I integregated my code into the DOMPrint example which comes along
> > > with Xercesc C++ 3.0.1. The following error message occurred:
> > >
> > > DOM Error during parsing:
> >
> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
> > > DOMException code is:  3
> > > Message is: attempt is made to insert a node where it is not permitted
> > >
> > >
> > > Did i misunderstand the functionality of the DOMLSParserFilter class
> > > and its method startElement?
> > > It is possible to realize my idea with the help of this class? Did
> > > i something wrong with in my code (please have a look below)?
> > >
> > > I would be very grateful for any help.
> > >
> > > Thanks in advanced,
> > > Mirko
> > >
> > >
> > > DOMPrintFilter.hpp:
> > > --------------------
> > >
> > >
> > > class DOMParserFilter : public DOMLSParserFilter {
> > > public:
> > >
> > >   DOMParserFilter(DOMNodeFilter::ShowType whatToShow =
> > DOMNodeFilter::SHOW_ALL);
> > >     ~DOMParserFilter(){};
> > >
> > >     virtual FilterAction startElement(DOMElement* node);
> > >     virtual FilterAction acceptNode(DOMNode* node){return
> > DOMParserFilter::FILTER_ACCEPT;};
> > >     virtual DOMNodeFilter::ShowType getWhatToShow() const {return
> > fWhatToShow;};
> > >
> > > private:
> > >     DOMNodeFilter::ShowType fWhatToShow;
> > > };
> > >
> > >
> > > DOMPrintFilter.cpp:
> > > --------------------
> > >
> > > DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow)
> > > :fWhatToShow(whatToShow)
> > > {}
> > >
> > > DOMParserFilter::FilterAction
> DOMParserFilter::startElement(DOMElement*
> > node)
> > > {
> > >   // for element whose name is "DATA", skip it
> > >   if (XMLString::compareString(node->getNodeName(), element_data)==0)
> > >     return DOMParserFilter::FILTER_REJECT;
> > >   else
> > >     return DOMParserFilter::FILTER_ACCEPT;
> > > }
> > >
> > >
> > > DOMPrint.cpp:
> > > ---------------
> > >
> > > static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S,
> > xercesc::chNull };
> > >
> > > xercesc::DOMImplementation *implParser =
> > xercesc::DOMImplementationRegistry::getDOMImplementation(gLS);
> > >
> > > xercesc::DOMLSParser* parser =
> >
> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0);
> > >
> > >
> > >
> > > DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
> > >
> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler,
> > errReporter);
> > >    
> > > DOMParserFilter * pDOMParserFilter = new DOMParserFilter();
> > > parser->setFilter(pDOMParserFilter);
> > >    
> > >
> > >     //
> > >     //  Parse the XML file, catching any XML exceptions that might
> > propogate
> > >     //  out of it.
> > >     //
> > >     bool errorsOccured = false;
> > >     DOMDocument *doc = NULL;
> > >
> > >     try
> > >     {
> > >       doc = parser->parseURI(gXmlFile);
> > >     }
> > >     catch (const OutOfMemoryException&)
> > >     {
> > >         XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" <<
> > XERCES_STD_QUALIFIER endl;
> > >         errorsOccured = true;
> > >     }
> > >     catch (const XMLException& e)
> > >     {
> > >         XERCES_STD_QUALIFIER cerr << "An error occurred during
> parsing\n
> >   Message: "
> > >              << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl;
> > >         errorsOccured = true;
> > >     }
> > >
> > >     catch (const DOMException& e)
> > >     {
> > >       const unsigned int maxChars = 2047;
> > >       XMLCh errText[maxChars + 1];
> > >
> > >       XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" <<
> > gXmlFile << "'\n"
> > >            << "DOMException code is:  " << e.code <<
> > XERCES_STD_QUALIFIER endl;
> > >
> > >       if (DOMImplementation::loadDOMExceptionMsg(e.code, errText,
> > maxChars))
> > >            XERCES_STD_QUALIFIER cerr << "Message is: " <<
> StrX(errText)
> > << XERCES_STD_QUALIFIER endl;
> > >
> > >       errorsOccured = true;
> > >     }
> > >
> > >     catch (...)
> > >     {
> > >         XERCES_STD_QUALIFIER cerr << "An error occurred during
> parsing\n
> > " << XERCES_STD_QUALIFIER endl;
> > >         errorsOccured = true;
> > >     }
> > >
> > >
> > >
> > >
> > >  

Re: method startElement() from class DOMLSParserFilter

by Mirko Braun-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Hi Alberto,

yes, i'm still using the method startElement(). Is it better
to use the method acceptNode() to reject the DATA node from
the DOM or is there any other possibility?

Mirko


-------- Original-Nachricht --------
> Datum: Fri, 04 Sep 2009 15:41:54 +0200
> Von: Alberto Massari <amassari@...>
> An: c-users@...
> Betreff: Re: method startElement() from class DOMLSParserFilter

> Hi Mirko,
> are you still using startElement()? That API would mess with the current
> parent, so it would break the parsing at a certain point.
>
> Alberto
>
> Mirko Braun wrote:
> > Hi Alberto,
> >
> > yes i'm sure that DATA is not a root node. I debugged a little bit.
> > The exception occurs after the sixth time this DATA node was found.
> >
> > Mirko
> >
> > -------- Original-Nachricht --------
> >  
> >> Datum: Fri, 04 Sep 2009 14:21:15 +0200
> >> Von: Alberto Massari <amassari@...>
> >> An: c-users@...
> >> Betreff: Re: method startElement() from class DOMLSParserFilter
> >>    
> >
> >  
> >> Hi Mirko,
> >> are you sure that your root node isn't one of those DATA elements? In
> >> this case the document node would see more than one root element.
> >>
> >> Alberto
> >>
> >> Mirko Braun wrote:
> >>    
> >>> Hi Alberto,
> >>>
> >>> thank you for you answer. I integrated the changes you
> >>> suggested, but the result is still the same:
> >>>
> >>> DOM Error during parsing:
> >>>
> >>>      
> >>
> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
> >>    
> >>> DOMException code is:  3
> >>> Message is: attempt is made to insert a node where it is not permitted
> >>>
> >>> Best regards,
> >>> Mirko
> >>>
> >>> -------- Original-Nachricht --------
> >>>  
> >>>      
> >>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200
> >>>> Von: Alberto Massari <amassari@...>
> >>>> An: c-users@...
> >>>> Betreff: Re: method startElement() from class DOMLSParserFilter
> >>>>    
> >>>>        
> >>>  
> >>>      
> >>>> Hi Mirko,
> >>>> I think the current implementation of the DOMLSParserFilter doesn't
> >>>>        
> >> work
> >>    
> >>>> nicely with your code, as the rejected nodes are not recycled and the
> >>>> memory will grow to the same level as before.
> >>>> Anyhow, you should instead override acceptNode like this:
> >>>>
> >>>> DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement*
> >>>> node)
> >>>> {
> >>>>   // for element whose name is "DATA", skip it
> >>>>    if (node->getNodeType()==DOMNode::ELEMENT_NODE &&
> >>>> XMLString::compareString(node->getNodeName(), element_data)==0)
> >>>>      return DOMParserFilter::FILTER_REJECT;
> >>>>   else
> >>>>     return DOMParserFilter::FILTER_ACCEPT;
> >>>> }
> >>>>
> >>>> Then, change DOMLSParserImpl::endElement to add a call to
> >>>> origNode->release() after the call to removeChild().
> >>>>
> >>>> Alberto
> >>>>
> >>>>
> >>>> Mirko Braun wrote:
> >>>>    
> >>>>        
> >>>>> Hello everybody,
> >>>>>
> >>>>> i would like to parse a quite large XML file (about 180 MB).
> >>>>> I used the DOM interface because i need the tree for further
> >>>>> processing of the data the xml file contains. Of course there
> >>>>> is a lot of memory used during parsing the file and i got an
> >>>>> "Out of memory" exception.
> >>>>>
> >>>>> I noticed that a class DOMLSParserFilter comes along wiht Xercesc
> C++
> >>>>>      
> >>>>>          
> >>>> 3.0.1 (Win32), which makes it possible to filter the Nodes during
> >>>>        
> >> parsing.
> >>    
> >>>>    
> >>>>        
> >>>>> That is perfect for me because one XML-Element in my large file
> >>>>> contains most of the data. This XML-Element is called DATA and
> >>>>> appears serveral time in my XML file.
> >>>>> So i had the idea to reject this XML-Element from the DOM tree
> >>>>> during parsing to reduce the used memory by using the method
> >>>>> startElement() of the DOMLSParserFilter class. After that i would
> >>>>> use a SAX parser and just get all XML-Elements DATA with their
> values.
> >>>>> But it does not work.
> >>>>> I integregated my code into the DOMPrint example which comes along
> >>>>> with Xercesc C++ 3.0.1. The following error message occurred:
> >>>>>
> >>>>> DOM Error during parsing:
> >>>>>      
> >>>>>          
> >>
> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
> >>    
> >>>>    
> >>>>        
> >>>>> DOMException code is:  3
> >>>>> Message is: attempt is made to insert a node where it is not
> permitted
> >>>>>
> >>>>>
> >>>>> Did i misunderstand the functionality of the DOMLSParserFilter class
> >>>>> and its method startElement?
> >>>>> It is possible to realize my idea with the help of this class? Did
> >>>>> i something wrong with in my code (please have a look below)?
> >>>>>
> >>>>> I would be very grateful for any help.
> >>>>>
> >>>>> Thanks in advanced,
> >>>>> Mirko
> >>>>>
> >>>>>
> >>>>> DOMPrintFilter.hpp:
> >>>>> --------------------
> >>>>>
> >>>>>
> >>>>> class DOMParserFilter : public DOMLSParserFilter {
> >>>>> public:
> >>>>>
> >>>>>   DOMParserFilter(DOMNodeFilter::ShowType whatToShow =
> >>>>>      
> >>>>>          
> >>>> DOMNodeFilter::SHOW_ALL);
> >>>>    
> >>>>        
> >>>>>     ~DOMParserFilter(){};
> >>>>>
> >>>>>     virtual FilterAction startElement(DOMElement* node);
> >>>>>     virtual FilterAction acceptNode(DOMNode* node){return
> >>>>>      
> >>>>>          
> >>>> DOMParserFilter::FILTER_ACCEPT;};
> >>>>    
> >>>>        
> >>>>>     virtual DOMNodeFilter::ShowType getWhatToShow() const {return
> >>>>>      
> >>>>>          
> >>>> fWhatToShow;};
> >>>>    
> >>>>        
> >>>>> private:
> >>>>>     DOMNodeFilter::ShowType fWhatToShow;
> >>>>> };
> >>>>>
> >>>>>
> >>>>> DOMPrintFilter.cpp:
> >>>>> --------------------
> >>>>>
> >>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow)
> >>>>> :fWhatToShow(whatToShow)
> >>>>> {}
> >>>>>
> >>>>> DOMParserFilter::FilterAction
> >>>>>          
> >> DOMParserFilter::startElement(DOMElement*
> >>    
> >>>>>      
> >>>>>          
> >>>> node)
> >>>>    
> >>>>        
> >>>>> {
> >>>>>   // for element whose name is "DATA", skip it
> >>>>>   if (XMLString::compareString(node->getNodeName(),
> element_data)==0)
> >>>>>     return DOMParserFilter::FILTER_REJECT;
> >>>>>   else
> >>>>>     return DOMParserFilter::FILTER_ACCEPT;
> >>>>> }
> >>>>>
> >>>>>
> >>>>> DOMPrint.cpp:
> >>>>> ---------------
> >>>>>
> >>>>> static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S,
> >>>>>      
> >>>>>          
> >>>> xercesc::chNull };
> >>>>    
> >>>>        
> >>>>> xercesc::DOMImplementation *implParser =
> >>>>>      
> >>>>>          
> >>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS);
> >>>>    
> >>>>        
> >>>>> xercesc::DOMLSParser* parser =
> >>>>>      
> >>>>>          
> >>
> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0);
> >>    
> >>>>    
> >>>>        
> >>>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
> >>>>>
> >>>>>          
> >>
> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler,
> >>    
> >>>>>      
> >>>>>          
> >>>> errReporter);
> >>>>    
> >>>>        
> >>>>>    
> >>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter();
> >>>>> parser->setFilter(pDOMParserFilter);
> >>>>>    
> >>>>>
> >>>>>     //
> >>>>>     //  Parse the XML file, catching any XML exceptions that might
> >>>>>      
> >>>>>          
> >>>> propogate
> >>>>    
> >>>>        
> >>>>>     //  out of it.
> >>>>>     //
> >>>>>     bool errorsOccured = false;
> >>>>>     DOMDocument *doc = NULL;
> >>>>>
> >>>>>     try
> >>>>>     {
> >>>>>       doc = parser->parseURI(gXmlFile);
> >>>>>     }
> >>>>>     catch (const OutOfMemoryException&)
> >>>>>     {
> >>>>>         XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" <<
> >>>>>      
> >>>>>          
> >>>> XERCES_STD_QUALIFIER endl;
> >>>>    
> >>>>        
> >>>>>         errorsOccured = true;
> >>>>>     }
> >>>>>     catch (const XMLException& e)
> >>>>>     {
> >>>>>         XERCES_STD_QUALIFIER cerr << "An error occurred during
> >>>>>          
> >> parsing\n
> >>    
> >>>>>      
> >>>>>          
> >>>>   Message: "
> >>>>    
> >>>>        
> >>>>>              << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl;
> >>>>>         errorsOccured = true;
> >>>>>     }
> >>>>>
> >>>>>     catch (const DOMException& e)
> >>>>>     {
> >>>>>       const unsigned int maxChars = 2047;
> >>>>>       XMLCh errText[maxChars + 1];
> >>>>>
> >>>>>       XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '"
> <<
> >>>>>      
> >>>>>          
> >>>> gXmlFile << "'\n"
> >>>>    
> >>>>        
> >>>>>            << "DOMException code is:  " << e.code <<
> >>>>>      
> >>>>>          
> >>>> XERCES_STD_QUALIFIER endl;
> >>>>    
> >>>>        
> >>>>>       if (DOMImplementation::loadDOMExceptionMsg(e.code, errText,
> >>>>>      
> >>>>>          
> >>>> maxChars))
> >>>>    
> >>>>        
> >>>>>            XERCES_STD_QUALIFIER cerr << "Message is: " <<
> >>>>>          
> >> StrX(errText)
> >>    
> >>>>>      
> >>>>>          
> >>>> << XERCES_STD_QUALIFIER endl;
> >>>>    
> >>>>        
> >>>>>       errorsOccured = true;
> >>>>>     }
> >>>>>
> >>>>>     catch (...)
> >>>>>     {
> >>>>>         XERCES_STD_QUALIFIER cerr << "An error occurred during
> >>>>>          
> >> parsing\n
> >>    
> >>>>>      
> >>>>>          
> >>>> " << XERCES_STD_QUALIFIER endl;
> >>>>    
> >>>>        
> >>>>>         errorsOccured = true;
> >>>>>     }
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>  
> >>>>>      
> >>>>>          
> >>>  
> >>>      
> >
> >  

Re: method startElement() from class DOMLSParserFilter

by Alberto Massari :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

In effect I am seeing so many problems with that code that the only
suggestion I have is to get the latest 3.0 from the trunk and work with
what I have just committed (or get the patch from
http://svn.apache.org/viewvc?rev=811420&view=rev and apply to the 3.0.1
code). This version should support your original code.

Alberto


Mirko Braun wrote:

> Hi Alberto,
>
> yes, i'm still using the method startElement(). Is it better
> to use the method acceptNode() to reject the DATA node from
> the DOM or is there any other possibility?
>
> Mirko
>
>
> -------- Original-Nachricht --------
>  
>> Datum: Fri, 04 Sep 2009 15:41:54 +0200
>> Von: Alberto Massari <amassari@...>
>> An: c-users@...
>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>    
>
>  
>> Hi Mirko,
>> are you still using startElement()? That API would mess with the current
>> parent, so it would break the parsing at a certain point.
>>
>> Alberto
>>
>> Mirko Braun wrote:
>>    
>>> Hi Alberto,
>>>
>>> yes i'm sure that DATA is not a root node. I debugged a little bit.
>>> The exception occurs after the sixth time this DATA node was found.
>>>
>>> Mirko
>>>
>>> -------- Original-Nachricht --------
>>>  
>>>      
>>>> Datum: Fri, 04 Sep 2009 14:21:15 +0200
>>>> Von: Alberto Massari <amassari@...>
>>>> An: c-users@...
>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>>>    
>>>>        
>>>  
>>>      
>>>> Hi Mirko,
>>>> are you sure that your root node isn't one of those DATA elements? In
>>>> this case the document node would see more than one root element.
>>>>
>>>> Alberto
>>>>
>>>> Mirko Braun wrote:
>>>>    
>>>>        
>>>>> Hi Alberto,
>>>>>
>>>>> thank you for you answer. I integrated the changes you
>>>>> suggested, but the result is still the same:
>>>>>
>>>>> DOM Error during parsing:
>>>>>
>>>>>      
>>>>>          
>> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
>>    
>>>>    
>>>>        
>>>>> DOMException code is:  3
>>>>> Message is: attempt is made to insert a node where it is not permitted
>>>>>
>>>>> Best regards,
>>>>> Mirko
>>>>>
>>>>> -------- Original-Nachricht --------
>>>>>  
>>>>>      
>>>>>          
>>>>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200
>>>>>> Von: Alberto Massari <amassari@...>
>>>>>> An: c-users@...
>>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>  
>>>>>      
>>>>>          
>>>>>> Hi Mirko,
>>>>>> I think the current implementation of the DOMLSParserFilter doesn't
>>>>>>        
>>>>>>            
>>>> work
>>>>    
>>>>        
>>>>>> nicely with your code, as the rejected nodes are not recycled and the
>>>>>> memory will grow to the same level as before.
>>>>>> Anyhow, you should instead override acceptNode like this:
>>>>>>
>>>>>> DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement*
>>>>>> node)
>>>>>> {
>>>>>>   // for element whose name is "DATA", skip it
>>>>>>    if (node->getNodeType()==DOMNode::ELEMENT_NODE &&
>>>>>> XMLString::compareString(node->getNodeName(), element_data)==0)
>>>>>>      return DOMParserFilter::FILTER_REJECT;
>>>>>>   else
>>>>>>     return DOMParserFilter::FILTER_ACCEPT;
>>>>>> }
>>>>>>
>>>>>> Then, change DOMLSParserImpl::endElement to add a call to
>>>>>> origNode->release() after the call to removeChild().
>>>>>>
>>>>>> Alberto
>>>>>>
>>>>>>
>>>>>> Mirko Braun wrote:
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>> Hello everybody,
>>>>>>>
>>>>>>> i would like to parse a quite large XML file (about 180 MB).
>>>>>>> I used the DOM interface because i need the tree for further
>>>>>>> processing of the data the xml file contains. Of course there
>>>>>>> is a lot of memory used during parsing the file and i got an
>>>>>>> "Out of memory" exception.
>>>>>>>
>>>>>>> I noticed that a class DOMLSParserFilter comes along wiht Xercesc
>>>>>>>              
>> C++
>>    
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>> 3.0.1 (Win32), which makes it possible to filter the Nodes during
>>>>>>        
>>>>>>            
>>>> parsing.
>>>>    
>>>>        
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>> That is perfect for me because one XML-Element in my large file
>>>>>>> contains most of the data. This XML-Element is called DATA and
>>>>>>> appears serveral time in my XML file.
>>>>>>> So i had the idea to reject this XML-Element from the DOM tree
>>>>>>> during parsing to reduce the used memory by using the method
>>>>>>> startElement() of the DOMLSParserFilter class. After that i would
>>>>>>> use a SAX parser and just get all XML-Elements DATA with their
>>>>>>>              
>> values.
>>    
>>>>>>> But it does not work.
>>>>>>> I integregated my code into the DOMPrint example which comes along
>>>>>>> with Xercesc C++ 3.0.1. The following error message occurred:
>>>>>>>
>>>>>>> DOM Error during parsing:
>>>>>>>      
>>>>>>>          
>>>>>>>              
>> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
>>    
>>>>    
>>>>        
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>> DOMException code is:  3
>>>>>>> Message is: attempt is made to insert a node where it is not
>>>>>>>              
>> permitted
>>    
>>>>>>> Did i misunderstand the functionality of the DOMLSParserFilter class
>>>>>>> and its method startElement?
>>>>>>> It is possible to realize my idea with the help of this class? Did
>>>>>>> i something wrong with in my code (please have a look below)?
>>>>>>>
>>>>>>> I would be very grateful for any help.
>>>>>>>
>>>>>>> Thanks in advanced,
>>>>>>> Mirko
>>>>>>>
>>>>>>>
>>>>>>> DOMPrintFilter.hpp:
>>>>>>> --------------------
>>>>>>>
>>>>>>>
>>>>>>> class DOMParserFilter : public DOMLSParserFilter {
>>>>>>> public:
>>>>>>>
>>>>>>>   DOMParserFilter(DOMNodeFilter::ShowType whatToShow =
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>> DOMNodeFilter::SHOW_ALL);
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>     ~DOMParserFilter(){};
>>>>>>>
>>>>>>>     virtual FilterAction startElement(DOMElement* node);
>>>>>>>     virtual FilterAction acceptNode(DOMNode* node){return
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>> DOMParserFilter::FILTER_ACCEPT;};
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>     virtual DOMNodeFilter::ShowType getWhatToShow() const {return
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>> fWhatToShow;};
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>> private:
>>>>>>>     DOMNodeFilter::ShowType fWhatToShow;
>>>>>>> };
>>>>>>>
>>>>>>>
>>>>>>> DOMPrintFilter.cpp:
>>>>>>> --------------------
>>>>>>>
>>>>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow)
>>>>>>> :fWhatToShow(whatToShow)
>>>>>>> {}
>>>>>>>
>>>>>>> DOMParserFilter::FilterAction
>>>>>>>          
>>>>>>>              
>>>> DOMParserFilter::startElement(DOMElement*
>>>>    
>>>>        
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>> node)
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>> {
>>>>>>>   // for element whose name is "DATA", skip it
>>>>>>>   if (XMLString::compareString(node->getNodeName(),
>>>>>>>              
>> element_data)==0)
>>    
>>>>>>>     return DOMParserFilter::FILTER_REJECT;
>>>>>>>   else
>>>>>>>     return DOMParserFilter::FILTER_ACCEPT;
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> DOMPrint.cpp:
>>>>>>> ---------------
>>>>>>>
>>>>>>> static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S,
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>> xercesc::chNull };
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>> xercesc::DOMImplementation *implParser =
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS);
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>> xercesc::DOMLSParser* parser =
>>>>>>>      
>>>>>>>          
>>>>>>>              
>> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0);
>>    
>>>>    
>>>>        
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
>>>>>>>
>>>>>>>          
>>>>>>>              
>> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler,
>>    
>>>>    
>>>>        
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>> errReporter);
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>    
>>>>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter();
>>>>>>> parser->setFilter(pDOMParserFilter);
>>>>>>>    
>>>>>>>
>>>>>>>     //
>>>>>>>     //  Parse the XML file, catching any XML exceptions that might
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>> propogate
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>     //  out of it.
>>>>>>>     //
>>>>>>>     bool errorsOccured = false;
>>>>>>>     DOMDocument *doc = NULL;
>>>>>>>
>>>>>>>     try
>>>>>>>     {
>>>>>>>       doc = parser->parseURI(gXmlFile);
>>>>>>>     }
>>>>>>>     catch (const OutOfMemoryException&)
>>>>>>>     {
>>>>>>>         XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" <<
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>> XERCES_STD_QUALIFIER endl;
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>         errorsOccured = true;
>>>>>>>     }
>>>>>>>     catch (const XMLException& e)
>>>>>>>     {
>>>>>>>         XERCES_STD_QUALIFIER cerr << "An error occurred during
>>>>>>>          
>>>>>>>              
>>>> parsing\n
>>>>    
>>>>        
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>>   Message: "
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>              << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl;
>>>>>>>         errorsOccured = true;
>>>>>>>     }
>>>>>>>
>>>>>>>     catch (const DOMException& e)
>>>>>>>     {
>>>>>>>       const unsigned int maxChars = 2047;
>>>>>>>       XMLCh errText[maxChars + 1];
>>>>>>>
>>>>>>>       XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '"
>>>>>>>              
>> <<
>>    
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>> gXmlFile << "'\n"
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>            << "DOMException code is:  " << e.code <<
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>> XERCES_STD_QUALIFIER endl;
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>       if (DOMImplementation::loadDOMExceptionMsg(e.code, errText,
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>> maxChars))
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>            XERCES_STD_QUALIFIER cerr << "Message is: " <<
>>>>>>>          
>>>>>>>              
>>>> StrX(errText)
>>>>    
>>>>        
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>> << XERCES_STD_QUALIFIER endl;
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>       errorsOccured = true;
>>>>>>>     }
>>>>>>>
>>>>>>>     catch (...)
>>>>>>>     {
>>>>>>>         XERCES_STD_QUALIFIER cerr << "An error occurred during
>>>>>>>          
>>>>>>>              
>>>> parsing\n
>>>>    
>>>>        
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>> " << XERCES_STD_QUALIFIER endl;
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>         errorsOccured = true;
>>>>>>>     }
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>  
>>>>>      
>>>>>          
>>>  
>>>      
>
>  


Re: method startElement() from class DOMLSParserFilter

by Mirko Braun-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Alberto,

thank you very much for your help. I integrated the patch in
3.0.1 and it worked. There is no exception any more.
But there is still one problem. The usage of memory is still
of the same size. I think if a node is rejected from the tree
the usage of memory should also decrease. Is my conclusion
correct?

Mirko

-------- Original-Nachricht --------
> Datum: Fri, 04 Sep 2009 16:12:16 +0200
> Von: Alberto Massari <amassari@...>
> An: c-users@...
> Betreff: Re: method startElement() from class DOMLSParserFilter

> In effect I am seeing so many problems with that code that the only
> suggestion I have is to get the latest 3.0 from the trunk and work with
> what I have just committed (or get the patch from
> http://svn.apache.org/viewvc?rev=811420&view=rev and apply to the 3.0.1
> code). This version should support your original code.
>
> Alberto
>
>
> Mirko Braun wrote:
> > Hi Alberto,
> >
> > yes, i'm still using the method startElement(). Is it better
> > to use the method acceptNode() to reject the DATA node from
> > the DOM or is there any other possibility?
> >
> > Mirko
> >
> >
> > -------- Original-Nachricht --------
> >  
> >> Datum: Fri, 04 Sep 2009 15:41:54 +0200
> >> Von: Alberto Massari <amassari@...>
> >> An: c-users@...
> >> Betreff: Re: method startElement() from class DOMLSParserFilter
> >>    
> >
> >  
> >> Hi Mirko,
> >> are you still using startElement()? That API would mess with the
> current
> >> parent, so it would break the parsing at a certain point.
> >>
> >> Alberto
> >>
> >> Mirko Braun wrote:
> >>    
> >>> Hi Alberto,
> >>>
> >>> yes i'm sure that DATA is not a root node. I debugged a little bit.
> >>> The exception occurs after the sixth time this DATA node was found.
> >>>
> >>> Mirko
> >>>
> >>> -------- Original-Nachricht --------
> >>>  
> >>>      
> >>>> Datum: Fri, 04 Sep 2009 14:21:15 +0200
> >>>> Von: Alberto Massari <amassari@...>
> >>>> An: c-users@...
> >>>> Betreff: Re: method startElement() from class DOMLSParserFilter
> >>>>    
> >>>>        
> >>>  
> >>>      
> >>>> Hi Mirko,
> >>>> are you sure that your root node isn't one of those DATA elements? In
> >>>> this case the document node would see more than one root element.
> >>>>
> >>>> Alberto
> >>>>
> >>>> Mirko Braun wrote:
> >>>>    
> >>>>        
> >>>>> Hi Alberto,
> >>>>>
> >>>>> thank you for you answer. I integrated the changes you
> >>>>> suggested, but the result is still the same:
> >>>>>
> >>>>> DOM Error during parsing:
> >>>>>
> >>>>>      
> >>>>>          
> >>
> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
> >>    
> >>>>    
> >>>>        
> >>>>> DOMException code is:  3
> >>>>> Message is: attempt is made to insert a node where it is not
> permitted
> >>>>>
> >>>>> Best regards,
> >>>>> Mirko
> >>>>>
> >>>>> -------- Original-Nachricht --------
> >>>>>  
> >>>>>      
> >>>>>          
> >>>>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200
> >>>>>> Von: Alberto Massari <amassari@...>
> >>>>>> An: c-users@...
> >>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>  
> >>>>>      
> >>>>>          
> >>>>>> Hi Mirko,
> >>>>>> I think the current implementation of the DOMLSParserFilter doesn't
> >>>>>>        
> >>>>>>            
> >>>> work
> >>>>    
> >>>>        
> >>>>>> nicely with your code, as the rejected nodes are not recycled and
> the
> >>>>>> memory will grow to the same level as before.
> >>>>>> Anyhow, you should instead override acceptNode like this:
> >>>>>>
> >>>>>> DOMParserFilter::FilterAction
> DOMParserFilter::acceptNode(DOMElement*
> >>>>>> node)
> >>>>>> {
> >>>>>>   // for element whose name is "DATA", skip it
> >>>>>>    if (node->getNodeType()==DOMNode::ELEMENT_NODE &&
> >>>>>> XMLString::compareString(node->getNodeName(), element_data)==0)
> >>>>>>      return DOMParserFilter::FILTER_REJECT;
> >>>>>>   else
> >>>>>>     return DOMParserFilter::FILTER_ACCEPT;
> >>>>>> }
> >>>>>>
> >>>>>> Then, change DOMLSParserImpl::endElement to add a call to
> >>>>>> origNode->release() after the call to removeChild().
> >>>>>>
> >>>>>> Alberto
> >>>>>>
> >>>>>>
> >>>>>> Mirko Braun wrote:
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>> Hello everybody,
> >>>>>>>
> >>>>>>> i would like to parse a quite large XML file (about 180 MB).
> >>>>>>> I used the DOM interface because i need the tree for further
> >>>>>>> processing of the data the xml file contains. Of course there
> >>>>>>> is a lot of memory used during parsing the file and i got an
> >>>>>>> "Out of memory" exception.
> >>>>>>>
> >>>>>>> I noticed that a class DOMLSParserFilter comes along wiht Xercesc
> >>>>>>>              
> >> C++
> >>    
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>> 3.0.1 (Win32), which makes it possible to filter the Nodes during
> >>>>>>        
> >>>>>>            
> >>>> parsing.
> >>>>    
> >>>>        
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>> That is perfect for me because one XML-Element in my large file
> >>>>>>> contains most of the data. This XML-Element is called DATA and
> >>>>>>> appears serveral time in my XML file.
> >>>>>>> So i had the idea to reject this XML-Element from the DOM tree
> >>>>>>> during parsing to reduce the used memory by using the method
> >>>>>>> startElement() of the DOMLSParserFilter class. After that i would
> >>>>>>> use a SAX parser and just get all XML-Elements DATA with their
> >>>>>>>              
> >> values.
> >>    
> >>>>>>> But it does not work.
> >>>>>>> I integregated my code into the DOMPrint example which comes along
> >>>>>>> with Xercesc C++ 3.0.1. The following error message occurred:
> >>>>>>>
> >>>>>>> DOM Error during parsing:
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>
> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
> >>    
> >>>>    
> >>>>        
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>> DOMException code is:  3
> >>>>>>> Message is: attempt is made to insert a node where it is not
> >>>>>>>              
> >> permitted
> >>    
> >>>>>>> Did i misunderstand the functionality of the DOMLSParserFilter
> class
> >>>>>>> and its method startElement?
> >>>>>>> It is possible to realize my idea with the help of this class? Did
> >>>>>>> i something wrong with in my code (please have a look below)?
> >>>>>>>
> >>>>>>> I would be very grateful for any help.
> >>>>>>>
> >>>>>>> Thanks in advanced,
> >>>>>>> Mirko
> >>>>>>>
> >>>>>>>
> >>>>>>> DOMPrintFilter.hpp:
> >>>>>>> --------------------
> >>>>>>>
> >>>>>>>
> >>>>>>> class DOMParserFilter : public DOMLSParserFilter {
> >>>>>>> public:
> >>>>>>>
> >>>>>>>   DOMParserFilter(DOMNodeFilter::ShowType whatToShow =
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>> DOMNodeFilter::SHOW_ALL);
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>     ~DOMParserFilter(){};
> >>>>>>>
> >>>>>>>     virtual FilterAction startElement(DOMElement* node);
> >>>>>>>     virtual FilterAction acceptNode(DOMNode* node){return
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>> DOMParserFilter::FILTER_ACCEPT;};
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>     virtual DOMNodeFilter::ShowType getWhatToShow() const {return
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>> fWhatToShow;};
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>> private:
> >>>>>>>     DOMNodeFilter::ShowType fWhatToShow;
> >>>>>>> };
> >>>>>>>
> >>>>>>>
> >>>>>>> DOMPrintFilter.cpp:
> >>>>>>> --------------------
> >>>>>>>
> >>>>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType
> whatToShow)
> >>>>>>> :fWhatToShow(whatToShow)
> >>>>>>> {}
> >>>>>>>
> >>>>>>> DOMParserFilter::FilterAction
> >>>>>>>          
> >>>>>>>              
> >>>> DOMParserFilter::startElement(DOMElement*
> >>>>    
> >>>>        
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>> node)
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>> {
> >>>>>>>   // for element whose name is "DATA", skip it
> >>>>>>>   if (XMLString::compareString(node->getNodeName(),
> >>>>>>>              
> >> element_data)==0)
> >>    
> >>>>>>>     return DOMParserFilter::FILTER_REJECT;
> >>>>>>>   else
> >>>>>>>     return DOMParserFilter::FILTER_ACCEPT;
> >>>>>>> }
> >>>>>>>
> >>>>>>>
> >>>>>>> DOMPrint.cpp:
> >>>>>>> ---------------
> >>>>>>>
> >>>>>>> static const XMLCh gLS[] = { xercesc::chLatin_L,
> xercesc::chLatin_S,
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>> xercesc::chNull };
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>> xercesc::DOMImplementation *implParser =
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS);
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>> xercesc::DOMLSParser* parser =
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>
> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0);
> >>    
> >>>>    
> >>>>        
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
> >>>>>>>
> >>>>>>>          
> >>>>>>>              
> >>
> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler,
> >>    
> >>>>    
> >>>>        
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>> errReporter);
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>    
> >>>>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter();
> >>>>>>> parser->setFilter(pDOMParserFilter);
> >>>>>>>    
> >>>>>>>
> >>>>>>>     //
> >>>>>>>     //  Parse the XML file, catching any XML exceptions that might
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>> propogate
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>     //  out of it.
> >>>>>>>     //
> >>>>>>>     bool errorsOccured = false;
> >>>>>>>     DOMDocument *doc = NULL;
> >>>>>>>
> >>>>>>>     try
> >>>>>>>     {
> >>>>>>>       doc = parser->parseURI(gXmlFile);
> >>>>>>>     }
> >>>>>>>     catch (const OutOfMemoryException&)
> >>>>>>>     {
> >>>>>>>         XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" <<
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>> XERCES_STD_QUALIFIER endl;
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>         errorsOccured = true;
> >>>>>>>     }
> >>>>>>>     catch (const XMLException& e)
> >>>>>>>     {
> >>>>>>>         XERCES_STD_QUALIFIER cerr << "An error occurred during
> >>>>>>>          
> >>>>>>>              
> >>>> parsing\n
> >>>>    
> >>>>        
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>>   Message: "
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>              << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl;
> >>>>>>>         errorsOccured = true;
> >>>>>>>     }
> >>>>>>>
> >>>>>>>     catch (const DOMException& e)
> >>>>>>>     {
> >>>>>>>       const unsigned int maxChars = 2047;
> >>>>>>>       XMLCh errText[maxChars + 1];
> >>>>>>>
> >>>>>>>       XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '"
> >>>>>>>              
> >> <<
> >>    
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>> gXmlFile << "'\n"
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>            << "DOMException code is:  " << e.code <<
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>> XERCES_STD_QUALIFIER endl;
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>       if (DOMImplementation::loadDOMExceptionMsg(e.code, errText,
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>> maxChars))
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>            XERCES_STD_QUALIFIER cerr << "Message is: " <<
> >>>>>>>          
> >>>>>>>              
> >>>> StrX(errText)
> >>>>    
> >>>>        
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>> << XERCES_STD_QUALIFIER endl;
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>       errorsOccured = true;
> >>>>>>>     }
> >>>>>>>
> >>>>>>>     catch (...)
> >>>>>>>     {
> >>>>>>>         XERCES_STD_QUALIFIER cerr << "An error occurred during
> >>>>>>>          
> >>>>>>>              
> >>>> parsing\n
> >>>>    
> >>>>        
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>> " << XERCES_STD_QUALIFIER endl;
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>         errorsOccured = true;
> >>>>>>>     }
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>  
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>  
> >>>>>      
> >>>>>          
> >>>  
> >>>      
> >
> >  

Re: method startElement() from class DOMLSParserFilter

by Alberto Massari :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Mirko Braun wrote:
> Hi Alberto,
>
> thank you very much for your help. I integrated the patch in
> 3.0.1 and it worked. There is no exception any more.
> But there is still one problem. The usage of memory is still
> of the same size. I think if a node is rejected from the tree
> the usage of memory should also decrease. Is my conclusion
> correct?
>  

Yes, if a node is rejected is should be marked for recycling; how much
memory are you seeing is been used?

Alberto

> Mirko
>
> -------- Original-Nachricht --------
>  
>> Datum: Fri, 04 Sep 2009 16:12:16 +0200
>> Von: Alberto Massari <amassari@...>
>> An: c-users@...
>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>    
>
>  
>> In effect I am seeing so many problems with that code that the only
>> suggestion I have is to get the latest 3.0 from the trunk and work with
>> what I have just committed (or get the patch from
>> http://svn.apache.org/viewvc?rev=811420&view=rev and apply to the 3.0.1
>> code). This version should support your original code.
>>
>> Alberto
>>
>>
>> Mirko Braun wrote:
>>    
>>> Hi Alberto,
>>>
>>> yes, i'm still using the method startElement(). Is it better
>>> to use the method acceptNode() to reject the DATA node from
>>> the DOM or is there any other possibility?
>>>
>>> Mirko
>>>
>>>
>>> -------- Original-Nachricht --------
>>>  
>>>      
>>>> Datum: Fri, 04 Sep 2009 15:41:54 +0200
>>>> Von: Alberto Massari <amassari@...>
>>>> An: c-users@...
>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>>>    
>>>>        
>>>  
>>>      
>>>> Hi Mirko,
>>>> are you still using startElement()? That API would mess with the
>>>>        
>> current
>>    
>>>> parent, so it would break the parsing at a certain point.
>>>>
>>>> Alberto
>>>>
>>>> Mirko Braun wrote:
>>>>    
>>>>        
>>>>> Hi Alberto,
>>>>>
>>>>> yes i'm sure that DATA is not a root node. I debugged a little bit.
>>>>> The exception occurs after the sixth time this DATA node was found.
>>>>>
>>>>> Mirko
>>>>>
>>>>> -------- Original-Nachricht --------
>>>>>  
>>>>>      
>>>>>          
>>>>>> Datum: Fri, 04 Sep 2009 14:21:15 +0200
>>>>>> Von: Alberto Massari <amassari@...>
>>>>>> An: c-users@...
>>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>  
>>>>>      
>>>>>          
>>>>>> Hi Mirko,
>>>>>> are you sure that your root node isn't one of those DATA elements? In
>>>>>> this case the document node would see more than one root element.
>>>>>>
>>>>>> Alberto
>>>>>>
>>>>>> Mirko Braun wrote:
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>> Hi Alberto,
>>>>>>>
>>>>>>> thank you for you answer. I integrated the changes you
>>>>>>> suggested, but the result is still the same:
>>>>>>>
>>>>>>> DOM Error during parsing:
>>>>>>>
>>>>>>>      
>>>>>>>          
>>>>>>>              
>> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
>>    
>>>>    
>>>>        
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>> DOMException code is:  3
>>>>>>> Message is: attempt is made to insert a node where it is not
>>>>>>>              
>> permitted
>>    
>>>>>>> Best regards,
>>>>>>> Mirko
>>>>>>>
>>>>>>> -------- Original-Nachricht --------
>>>>>>>  
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200
>>>>>>>> Von: Alberto Massari <amassari@...>
>>>>>>>> An: c-users@...
>>>>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>  
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>>>> Hi Mirko,
>>>>>>>> I think the current implementation of the DOMLSParserFilter doesn't
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>> work
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>> nicely with your code, as the rejected nodes are not recycled and
>>>>>>>>                
>> the
>>    
>>>>>>>> memory will grow to the same level as before.
>>>>>>>> Anyhow, you should instead override acceptNode like this:
>>>>>>>>
>>>>>>>> DOMParserFilter::FilterAction
>>>>>>>>                
>> DOMParserFilter::acceptNode(DOMElement*
>>    
>>>>>>>> node)
>>>>>>>> {
>>>>>>>>   // for element whose name is "DATA", skip it
>>>>>>>>    if (node->getNodeType()==DOMNode::ELEMENT_NODE &&
>>>>>>>> XMLString::compareString(node->getNodeName(), element_data)==0)
>>>>>>>>      return DOMParserFilter::FILTER_REJECT;
>>>>>>>>   else
>>>>>>>>     return DOMParserFilter::FILTER_ACCEPT;
>>>>>>>> }
>>>>>>>>
>>>>>>>> Then, change DOMLSParserImpl::endElement to add a call to
>>>>>>>> origNode->release() after the call to removeChild().
>>>>>>>>
>>>>>>>> Alberto
>>>>>>>>
>>>>>>>>
>>>>>>>> Mirko Braun wrote:
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>> Hello everybody,
>>>>>>>>>
>>>>>>>>> i would like to parse a quite large XML file (about 180 MB).
>>>>>>>>> I used the DOM interface because i need the tree for further
>>>>>>>>> processing of the data the xml file contains. Of course there
>>>>>>>>> is a lot of memory used during parsing the file and i got an
>>>>>>>>> "Out of memory" exception.
>>>>>>>>>
>>>>>>>>> I noticed that a class DOMLSParserFilter comes along wiht Xercesc
>>>>>>>>>              
>>>>>>>>>                  
>>>> C++
>>>>    
>>>>        
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>>>> 3.0.1 (Win32), which makes it possible to filter the Nodes during
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>> parsing.
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>> That is perfect for me because one XML-Element in my large file
>>>>>>>>> contains most of the data. This XML-Element is called DATA and
>>>>>>>>> appears serveral time in my XML file.
>>>>>>>>> So i had the idea to reject this XML-Element from the DOM tree
>>>>>>>>> during parsing to reduce the used memory by using the method
>>>>>>>>> startElement() of the DOMLSParserFilter class. After that i would
>>>>>>>>> use a SAX parser and just get all XML-Elements DATA with their
>>>>>>>>>              
>>>>>>>>>                  
>>>> values.
>>>>    
>>>>        
>>>>>>>>> But it does not work.
>>>>>>>>> I integregated my code into the DOMPrint example which comes along
>>>>>>>>> with Xercesc C++ 3.0.1. The following error message occurred:
>>>>>>>>>
>>>>>>>>> DOM Error during parsing:
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
>>    
>>>>    
>>>>        
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>> DOMException code is:  3
>>>>>>>>> Message is: attempt is made to insert a node where it is not
>>>>>>>>>              
>>>>>>>>>                  
>>>> permitted
>>>>    
>>>>        
>>>>>>>>> Did i misunderstand the functionality of the DOMLSParserFilter
>>>>>>>>>                  
>> class
>>    
>>>>>>>>> and its method startElement?
>>>>>>>>> It is possible to realize my idea with the help of this class? Did
>>>>>>>>> i something wrong with in my code (please have a look below)?
>>>>>>>>>
>>>>>>>>> I would be very grateful for any help.
>>>>>>>>>
>>>>>>>>> Thanks in advanced,
>>>>>>>>> Mirko
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> DOMPrintFilter.hpp:
>>>>>>>>> --------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> class DOMParserFilter : public DOMLSParserFilter {
>>>>>>>>> public:
>>>>>>>>>
>>>>>>>>>   DOMParserFilter(DOMNodeFilter::ShowType whatToShow =
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>>>> DOMNodeFilter::SHOW_ALL);
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>>     ~DOMParserFilter(){};
>>>>>>>>>
>>>>>>>>>     virtual FilterAction startElement(DOMElement* node);
>>>>>>>>>     virtual FilterAction acceptNode(DOMNode* node){return
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>>>> DOMParserFilter::FILTER_ACCEPT;};
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>>     virtual DOMNodeFilter::ShowType getWhatToShow() const {return
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>>>> fWhatToShow;};
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>> private:
>>>>>>>>>     DOMNodeFilter::ShowType fWhatToShow;
>>>>>>>>> };
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> DOMPrintFilter.cpp:
>>>>>>>>> --------------------
>>>>>>>>>
>>>>>>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType
>>>>>>>>>                  
>> whatToShow)
>>    
>>>>>>>>> :fWhatToShow(whatToShow)
>>>>>>>>> {}
>>>>>>>>>
>>>>>>>>> DOMParserFilter::FilterAction
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>> DOMParserFilter::startElement(DOMElement*
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>>>> node)
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>> {
>>>>>>>>>   // for element whose name is "DATA", skip it
>>>>>>>>>   if (XMLString::compareString(node->getNodeName(),
>>>>>>>>>              
>>>>>>>>>                  
>>>> element_data)==0)
>>>>    
>>>>        
>>>>>>>>>     return DOMParserFilter::FILTER_REJECT;
>>>>>>>>>   else
>>>>>>>>>     return DOMParserFilter::FILTER_ACCEPT;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> DOMPrint.cpp:
>>>>>>>>> ---------------
>>>>>>>>>
>>>>>>>>> static const XMLCh gLS[] = { xercesc::chLatin_L,
>>>>>>>>>                  
>> xercesc::chLatin_S,
>>    
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>>>> xercesc::chNull };
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>> xercesc::DOMImplementation *implParser =
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS);
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>> xercesc::DOMLSParser* parser =
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0);
>>    
>>>>    
>>>>        
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
>>>>>>>>>
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler,
>>    
>>>>    
>>>>        
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>>>> errReporter);
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>>    
>>>>>>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter();
>>>>>>>>> parser->setFilter(pDOMParserFilter);
>>>>>>>>>    
>>>>>>>>>
>>>>>>>>>     //
>>>>>>>>>     //  Parse the XML file, catching any XML exceptions that might
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>>>> propogate
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>>     //  out of it.
>>>>>>>>>     //
>>>>>>>>>     bool errorsOccured = false;
>>>>>>>>>     DOMDocument *doc = NULL;
>>>>>>>>>
>>>>>>>>>     try
>>>>>>>>>     {
>>>>>>>>>       doc = parser->parseURI(gXmlFile);
>>>>>>>>>     }
>>>>>>>>>     catch (const OutOfMemoryException&)
>>>>>>>>>     {
>>>>>>>>>         XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" <<
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>>>> XERCES_STD_QUALIFIER endl;
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>>         errorsOccured = true;
>>>>>>>>>     }
>>>>>>>>>     catch (const XMLException& e)
>>>>>>>>>     {
>>>>>>>>>         XERCES_STD_QUALIFIER cerr << "An error occurred during
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>> parsing\n
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>>>>   Message: "
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>>              << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl;
>>>>>>>>>         errorsOccured = true;
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>>     catch (const DOMException& e)
>>>>>>>>>     {
>>>>>>>>>       const unsigned int maxChars = 2047;
>>>>>>>>>       XMLCh errText[maxChars + 1];
>>>>>>>>>
>>>>>>>>>       XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '"
>>>>>>>>>              
>>>>>>>>>                  
>>>> <<
>>>>    
>>>>        
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>>>> gXmlFile << "'\n"
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>>            << "DOMException code is:  " << e.code <<
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>>>> XERCES_STD_QUALIFIER endl;
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>>       if (DOMImplementation::loadDOMExceptionMsg(e.code, errText,
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>>>> maxChars))
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>>            XERCES_STD_QUALIFIER cerr << "Message is: " <<
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>> StrX(errText)
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>>>> << XERCES_STD_QUALIFIER endl;
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>>       errorsOccured = true;
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>>     catch (...)
>>>>>>>>>     {
>>>>>>>>>         XERCES_STD_QUALIFIER cerr << "An error occurred during
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>> parsing\n
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>>>> " << XERCES_STD_QUALIFIER endl;
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>>         errorsOccured = true;
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>>>  
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>  
>>>>>      
>>>>>          
>>>  
>>>      
>
>  


Re: method startElement() from class DOMLSParserFilter

by Mirko Braun-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Sorry, I don't know how much memory is used. I just had a look at the
maximum used memory in the task manager (Window XP). It doesn't
matter if i used a DOMLSParserFilter or not the process DOMPrint.exe used the same size of memory.
The XML-Elements DATA which i want to reject have very large values
and i think if i reject these nodes they are also removed from
memory. Does "be marked for recycling" mean, that these DATA nodes
remain in memory?

Mirko

-------- Original-Nachricht --------
> Datum: Mon, 07 Sep 2009 09:26:05 +0200
> Von: Alberto Massari <amassari@...>
> An: c-users@...
> Betreff: Re: method startElement() from class DOMLSParserFilter

> Mirko Braun wrote:
> > Hi Alberto,
> >
> > thank you very much for your help. I integrated the patch in
> > 3.0.1 and it worked. There is no exception any more.
> > But there is still one problem. The usage of memory is still
> > of the same size. I think if a node is rejected from the tree
> > the usage of memory should also decrease. Is my conclusion
> > correct?
> >  
>
> Yes, if a node is rejected is should be marked for recycling; how much
> memory are you seeing is been used?
>
> Alberto
>
> > Mirko
> >
> > -------- Original-Nachricht --------
> >  
> >> Datum: Fri, 04 Sep 2009 16:12:16 +0200
> >> Von: Alberto Massari <amassari@...>
> >> An: c-users@...
> >> Betreff: Re: method startElement() from class DOMLSParserFilter
> >>    
> >
> >  
> >> In effect I am seeing so many problems with that code that the only
> >> suggestion I have is to get the latest 3.0 from the trunk and work with
> >> what I have just committed (or get the patch from
> >> http://svn.apache.org/viewvc?rev=811420&view=rev and apply to the 3.0.1
> >> code). This version should support your original code.
> >>
> >> Alberto
> >>
> >>
> >> Mirko Braun wrote:
> >>    
> >>> Hi Alberto,
> >>>
> >>> yes, i'm still using the method startElement(). Is it better
> >>> to use the method acceptNode() to reject the DATA node from
> >>> the DOM or is there any other possibility?
> >>>
> >>> Mirko
> >>>
> >>>
> >>> -------- Original-Nachricht --------
> >>>  
> >>>      
> >>>> Datum: Fri, 04 Sep 2009 15:41:54 +0200
> >>>> Von: Alberto Massari <amassari@...>
> >>>> An: c-users@...
> >>>> Betreff: Re: method startElement() from class DOMLSParserFilter
> >>>>    
> >>>>        
> >>>  
> >>>      
> >>>> Hi Mirko,
> >>>> are you still using startElement()? That API would mess with the
> >>>>        
> >> current
> >>    
> >>>> parent, so it would break the parsing at a certain point.
> >>>>
> >>>> Alberto
> >>>>
> >>>> Mirko Braun wrote:
> >>>>    
> >>>>        
> >>>>> Hi Alberto,
> >>>>>
> >>>>> yes i'm sure that DATA is not a root node. I debugged a little bit.
> >>>>> The exception occurs after the sixth time this DATA node was found.
> >>>>>
> >>>>> Mirko
> >>>>>
> >>>>> -------- Original-Nachricht --------
> >>>>>  
> >>>>>      
> >>>>>          
> >>>>>> Datum: Fri, 04 Sep 2009 14:21:15 +0200
> >>>>>> Von: Alberto Massari <amassari@...>
> >>>>>> An: c-users@...
> >>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>  
> >>>>>      
> >>>>>          
> >>>>>> Hi Mirko,
> >>>>>> are you sure that your root node isn't one of those DATA elements?
> In
> >>>>>> this case the document node would see more than one root element.
> >>>>>>
> >>>>>> Alberto
> >>>>>>
> >>>>>> Mirko Braun wrote:
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>> Hi Alberto,
> >>>>>>>
> >>>>>>> thank you for you answer. I integrated the changes you
> >>>>>>> suggested, but the result is still the same:
> >>>>>>>
> >>>>>>> DOM Error during parsing:
> >>>>>>>
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>
> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
> >>    
> >>>>    
> >>>>        
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>> DOMException code is:  3
> >>>>>>> Message is: attempt is made to insert a node where it is not
> >>>>>>>              
> >> permitted
> >>    
> >>>>>>> Best regards,
> >>>>>>> Mirko
> >>>>>>>
> >>>>>>> -------- Original-Nachricht --------
> >>>>>>>  
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200
> >>>>>>>> Von: Alberto Massari <amassari@...>
> >>>>>>>> An: c-users@...
> >>>>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>  
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>>>> Hi Mirko,
> >>>>>>>> I think the current implementation of the DOMLSParserFilter
> doesn't
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>> work
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>> nicely with your code, as the rejected nodes are not recycled and
> >>>>>>>>                
> >> the
> >>    
> >>>>>>>> memory will grow to the same level as before.
> >>>>>>>> Anyhow, you should instead override acceptNode like this:
> >>>>>>>>
> >>>>>>>> DOMParserFilter::FilterAction
> >>>>>>>>                
> >> DOMParserFilter::acceptNode(DOMElement*
> >>    
> >>>>>>>> node)
> >>>>>>>> {
> >>>>>>>>   // for element whose name is "DATA", skip it
> >>>>>>>>    if (node->getNodeType()==DOMNode::ELEMENT_NODE &&
> >>>>>>>> XMLString::compareString(node->getNodeName(), element_data)==0)
> >>>>>>>>      return DOMParserFilter::FILTER_REJECT;
> >>>>>>>>   else
> >>>>>>>>     return DOMParserFilter::FILTER_ACCEPT;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> Then, change DOMLSParserImpl::endElement to add a call to
> >>>>>>>> origNode->release() after the call to removeChild().
> >>>>>>>>
> >>>>>>>> Alberto
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Mirko Braun wrote:
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>> Hello everybody,
> >>>>>>>>>
> >>>>>>>>> i would like to parse a quite large XML file (about 180 MB).
> >>>>>>>>> I used the DOM interface because i need the tree for further
> >>>>>>>>> processing of the data the xml file contains. Of course there
> >>>>>>>>> is a lot of memory used during parsing the file and i got an
> >>>>>>>>> "Out of memory" exception.
> >>>>>>>>>
> >>>>>>>>> I noticed that a class DOMLSParserFilter comes along wiht
> Xercesc
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>> C++
> >>>>    
> >>>>        
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>>>> 3.0.1 (Win32), which makes it possible to filter the Nodes during
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>> parsing.
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>> That is perfect for me because one XML-Element in my large file
> >>>>>>>>> contains most of the data. This XML-Element is called DATA and
> >>>>>>>>> appears serveral time in my XML file.
> >>>>>>>>> So i had the idea to reject this XML-Element from the DOM tree
> >>>>>>>>> during parsing to reduce the used memory by using the method
> >>>>>>>>> startElement() of the DOMLSParserFilter class. After that i
> would
> >>>>>>>>> use a SAX parser and just get all XML-Elements DATA with their
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>> values.
> >>>>    
> >>>>        
> >>>>>>>>> But it does not work.
> >>>>>>>>> I integregated my code into the DOMPrint example which comes
> along
> >>>>>>>>> with Xercesc C++ 3.0.1. The following error message occurred:
> >>>>>>>>>
> >>>>>>>>> DOM Error during parsing:
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>
> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
> >>    
> >>>>    
> >>>>        
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>> DOMException code is:  3
> >>>>>>>>> Message is: attempt is made to insert a node where it is not
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>> permitted
> >>>>    
> >>>>        
> >>>>>>>>> Did i misunderstand the functionality of the DOMLSParserFilter
> >>>>>>>>>                  
> >> class
> >>    
> >>>>>>>>> and its method startElement?
> >>>>>>>>> It is possible to realize my idea with the help of this class?
> Did
> >>>>>>>>> i something wrong with in my code (please have a look below)?
> >>>>>>>>>
> >>>>>>>>> I would be very grateful for any help.
> >>>>>>>>>
> >>>>>>>>> Thanks in advanced,
> >>>>>>>>> Mirko
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> DOMPrintFilter.hpp:
> >>>>>>>>> --------------------
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> class DOMParserFilter : public DOMLSParserFilter {
> >>>>>>>>> public:
> >>>>>>>>>
> >>>>>>>>>   DOMParserFilter(DOMNodeFilter::ShowType whatToShow =
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>>>> DOMNodeFilter::SHOW_ALL);
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>>     ~DOMParserFilter(){};
> >>>>>>>>>
> >>>>>>>>>     virtual FilterAction startElement(DOMElement* node);
> >>>>>>>>>     virtual FilterAction acceptNode(DOMNode* node){return
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>>>> DOMParserFilter::FILTER_ACCEPT;};
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>>     virtual DOMNodeFilter::ShowType getWhatToShow() const
> {return
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>>>> fWhatToShow;};
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>> private:
> >>>>>>>>>     DOMNodeFilter::ShowType fWhatToShow;
> >>>>>>>>> };
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> DOMPrintFilter.cpp:
> >>>>>>>>> --------------------
> >>>>>>>>>
> >>>>>>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType
> >>>>>>>>>                  
> >> whatToShow)
> >>    
> >>>>>>>>> :fWhatToShow(whatToShow)
> >>>>>>>>> {}
> >>>>>>>>>
> >>>>>>>>> DOMParserFilter::FilterAction
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>> DOMParserFilter::startElement(DOMElement*
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>>>> node)
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>> {
> >>>>>>>>>   // for element whose name is "DATA", skip it
> >>>>>>>>>   if (XMLString::compareString(node->getNodeName(),
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>> element_data)==0)
> >>>>    
> >>>>        
> >>>>>>>>>     return DOMParserFilter::FILTER_REJECT;
> >>>>>>>>>   else
> >>>>>>>>>     return DOMParserFilter::FILTER_ACCEPT;
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> DOMPrint.cpp:
> >>>>>>>>> ---------------
> >>>>>>>>>
> >>>>>>>>> static const XMLCh gLS[] = { xercesc::chLatin_L,
> >>>>>>>>>                  
> >> xercesc::chLatin_S,
> >>    
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>>>> xercesc::chNull };
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>> xercesc::DOMImplementation *implParser =
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS);
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>> xercesc::DOMLSParser* parser =
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>
> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0);
> >>    
> >>>>    
> >>>>        
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
> >>>>>>>>>
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>
> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler,
> >>    
> >>>>    
> >>>>        
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>>>> errReporter);
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>>    
> >>>>>>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter();
> >>>>>>>>> parser->setFilter(pDOMParserFilter);
> >>>>>>>>>    
> >>>>>>>>>
> >>>>>>>>>     //
> >>>>>>>>>     //  Parse the XML file, catching any XML exceptions that
> might
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>>>> propogate
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>>     //  out of it.
> >>>>>>>>>     //
> >>>>>>>>>     bool errorsOccured = false;
> >>>>>>>>>     DOMDocument *doc = NULL;
> >>>>>>>>>
> >>>>>>>>>     try
> >>>>>>>>>     {
> >>>>>>>>>       doc = parser->parseURI(gXmlFile);
> >>>>>>>>>     }
> >>>>>>>>>     catch (const OutOfMemoryException&)
> >>>>>>>>>     {
> >>>>>>>>>         XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" <<
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>>>> XERCES_STD_QUALIFIER endl;
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>>         errorsOccured = true;
> >>>>>>>>>     }
> >>>>>>>>>     catch (const XMLException& e)
> >>>>>>>>>     {
> >>>>>>>>>         XERCES_STD_QUALIFIER cerr << "An error occurred during
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>> parsing\n
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>>>>   Message: "
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>>              << StrX(e.getMessage()) << XERCES_STD_QUALIFIER
> endl;
> >>>>>>>>>         errorsOccured = true;
> >>>>>>>>>     }
> >>>>>>>>>
> >>>>>>>>>     catch (const DOMException& e)
> >>>>>>>>>     {
> >>>>>>>>>       const unsigned int maxChars = 2047;
> >>>>>>>>>       XMLCh errText[maxChars + 1];
> >>>>>>>>>
> >>>>>>>>>       XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing:
> '"
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>> <<
> >>>>    
> >>>>        
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>>>> gXmlFile << "'\n"
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>>            << "DOMException code is:  " << e.code <<
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>>>> XERCES_STD_QUALIFIER endl;
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>>       if (DOMImplementation::loadDOMExceptionMsg(e.code,
> errText,
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>>>> maxChars))
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>>            XERCES_STD_QUALIFIER cerr << "Message is: " <<
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>> StrX(errText)
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>>>> << XERCES_STD_QUALIFIER endl;
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>>       errorsOccured = true;
> >>>>>>>>>     }
> >>>>>>>>>
> >>>>>>>>>     catch (...)
> >>>>>>>>>     {
> >>>>>>>>>         XERCES_STD_QUALIFIER cerr << "An error occurred during
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>> parsing\n
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>>>> " << XERCES_STD_QUALIFIER endl;
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>>         errorsOccured = true;
> >>>>>>>>>     }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>  
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>>>  
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>  
> >>>>>      
> >>>>>          
> >>>  
> >>>      
> >
> >  

Re: method startElement() from class DOMLSParserFilter

by Alberto Massari :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

When you call release() on a node, the node is not deleted (as its
memory comes from a pool that can be deleted as a whole) but it's placed
in a "recycle bin" from where it is taken when a new node of the same
type is requested. So, the next element will not allocate extra memory,
but reuse that node. What I need to check is if node texts do the same
with the buffer used to keep the node value, and how they are recycled
(i.e. if the big buffer used by DATA nodes is reused for a much smaller
node)

Alberto

Mirko Braun wrote:

> Sorry, I don't know how much memory is used. I just had a look at the
> maximum used memory in the task manager (Window XP). It doesn't
> matter if i used a DOMLSParserFilter or not the process DOMPrint.exe used the same size of memory.
> The XML-Elements DATA which i want to reject have very large values
> and i think if i reject these nodes they are also removed from
> memory. Does "be marked for recycling" mean, that these DATA nodes
> remain in memory?
>
> Mirko
>
> -------- Original-Nachricht --------
>  
>> Datum: Mon, 07 Sep 2009 09:26:05 +0200
>> Von: Alberto Massari <amassari@...>
>> An: c-users@...
>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>    
>
>  
>> Mirko Braun wrote:
>>    
>>> Hi Alberto,
>>>
>>> thank you very much for your help. I integrated the patch in
>>> 3.0.1 and it worked. There is no exception any more.
>>> But there is still one problem. The usage of memory is still
>>> of the same size. I think if a node is rejected from the tree
>>> the usage of memory should also decrease. Is my conclusion
>>> correct?
>>>  
>>>      
>> Yes, if a node is rejected is should be marked for recycling; how much
>> memory are you seeing is been used?
>>
>> Alberto
>>
>>    
>>> Mirko
>>>
>>> -------- Original-Nachricht --------
>>>  
>>>      
>>>> Datum: Fri, 04 Sep 2009 16:12:16 +0200
>>>> Von: Alberto Massari <amassari@...>
>>>> An: c-users@...
>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>>>    
>>>>        
>>>  
>>>      
>>>> In effect I am seeing so many problems with that code that the only
>>>> suggestion I have is to get the latest 3.0 from the trunk and work with
>>>> what I have just committed (or get the patch from
>>>> http://svn.apache.org/viewvc?rev=811420&view=rev and apply to the 3.0.1
>>>> code). This version should support your original code.
>>>>
>>>> Alberto
>>>>
>>>>
>>>> Mirko Braun wrote:
>>>>    
>>>>        
>>>>> Hi Alberto,
>>>>>
>>>>> yes, i'm still using the method startElement(). Is it better
>>>>> to use the method acceptNode() to reject the DATA node from
>>>>> the DOM or is there any other possibility?
>>>>>
>>>>> Mirko
>>>>>
>>>>>
>>>>> -------- Original-Nachricht --------
>>>>>  
>>>>>      
>>>>>          
>>>>>> Datum: Fri, 04 Sep 2009 15:41:54 +0200
>>>>>> Von: Alberto Massari <amassari@...>
>>>>>> An: c-users@...
>>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>  
>>>>>      
>>>>>          
>>>>>> Hi Mirko,
>>>>>> are you still using startElement()? That API would mess with the
>>>>>>        
>>>>>>            
>>>> current
>>>>    
>>>>        
>>>>>> parent, so it would break the parsing at a certain point.
>>>>>>
>>>>>> Alberto
>>>>>>
>>>>>> Mirko Braun wrote:
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>> Hi Alberto,
>>>>>>>
>>>>>>> yes i'm sure that DATA is not a root node. I debugged a little bit.
>>>>>>> The exception occurs after the sixth time this DATA node was found.
>>>>>>>
>>>>>>> Mirko
>>>>>>>
>>>>>>> -------- Original-Nachricht --------
>>>>>>>  
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>>>> Datum: Fri, 04 Sep 2009 14:21:15 +0200
>>>>>>>> Von: Alberto Massari <amassari@...>
>>>>>>>> An: c-users@...
>>>>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>  
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>>>> Hi Mirko,
>>>>>>>> are you sure that your root node isn't one of those DATA elements?
>>>>>>>>                
>> In
>>    
>>>>>>>> this case the document node would see more than one root element.
>>>>>>>>
>>>>>>>> Alberto
>>>>>>>>
>>>>>>>> Mirko Braun wrote:
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>> Hi Alberto,
>>>>>>>>>
>>>>>>>>> thank you for you answer. I integrated the changes you
>>>>>>>>> suggested, but the result is still the same:
>>>>>>>>>
>>>>>>>>> DOM Error during parsing:
>>>>>>>>>
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
>>    
>>>>    
>>>>        
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>> DOMException code is:  3
>>>>>>>>> Message is: attempt is made to insert a node where it is not
>>>>>>>>>              
>>>>>>>>>                  
>>>> permitted
>>>>    
>>>>        
>>>>>>>>> Best regards,
>>>>>>>>> Mirko
>>>>>>>>>
>>>>>>>>> -------- Original-Nachricht --------
>>>>>>>>>  
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>>>>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200
>>>>>>>>>> Von: Alberto Massari <amassari@...>
>>>>>>>>>> An: c-users@...
>>>>>>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>>>>>>>>>    
>>>>>>>>>>        
>>>>>>>>>>            
>>>>>>>>>>                
>>>>>>>>>>                    
>>>>>>>>>  
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>>>>>> Hi Mirko,
>>>>>>>>>> I think the current implementation of the DOMLSParserFilter
>>>>>>>>>>                    
>> doesn't
>>    
>>>>>>>>>>        
>>>>>>>>>>            
>>>>>>>>>>                
>>>>>>>>>>                    
>>>>>>>> work
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>>> nicely with your code, as the rejected nodes are not recycled and
>>>>>>>>>>                
>>>>>>>>>>                    
>>>> the
>>>>    
>>>>        
>>>>>>>>>> memory will grow to the same level as before.
>>>>>>>>>> Anyhow, you should instead override acceptNode like this:
>>>>>>>>>>
>>>>>>>>>> DOMParserFilter::FilterAction
>>>>>>>>>>                
>>>>>>>>>>                    
>>>> DOMParserFilter::acceptNode(DOMElement*
>>>>    
>>>>        
>>>>>>>>>> node)
>>>>>>>>>> {
>>>>>>>>>>   // for element whose name is "DATA", skip it
>>>>>>>>>>    if (node->getNodeType()==DOMNode::ELEMENT_NODE &&
>>>>>>>>>> XMLString::compareString(node->getNodeName(), element_data)==0)
>>>>>>>>>>      return DOMParserFilter::FILTER_REJECT;
>>>>>>>>>>   else
>>>>>>>>>>     return DOMParserFilter::FILTER_ACCEPT;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Then, change DOMLSParserImpl::endElement to add a call to
>>>>>>>>>> origNode->release() after the call to removeChild().
>>>>>>>>>>
>>>>>>>>>> Alberto
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Mirko Braun wrote:
>>>>>>>>>>    
>>>>>>>>>>        
>>>>>>>>>>            
>>>>>>>>>>                
>>>>>>>>>>                    
>>>>>>>>>>> Hello everybody,
>>>>>>>>>>>
>>>>>>>>>>> i would like to parse a quite large XML file (about 180 MB).
>>>>>>>>>>> I used the DOM interface because i need the tree for further
>>>>>>>>>>> processing of the data the xml file contains. Of course there
>>>>>>>>>>> is a lot of memory used during parsing the file and i got an
>>>>>>>>>>> "Out of memory" exception.
>>>>>>>>>>>
>>>>>>>>>>> I noticed that a class DOMLSParserFilter comes along wiht
>>>>>>>>>>>                      
>> Xercesc
>>    
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>> C++
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>>>>>      
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>>>>>> 3.0.1 (Win32), which makes it possible to filter the Nodes during
>>>>>>>>>>        
>>>>>>>>>>            
>>>>>>>>>>                
>>>>>>>>>>                    
>>>>>>>> parsing.
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>>>    
>>>>>>>>>>        
>>>>>>>>>>            
>>>>>>>>>>                
>>>>>>>>>>                    
>>>>>>>>>>> That is perfect for me because one XML-Element in my large file
>>>>>>>>>>> contains most of the data. This XML-Element is called DATA and
>>>>>>>>>>> appears serveral time in my XML file.
>>>>>>>>>>> So i had the idea to reject this XML-Element from the DOM tree
>>>>>>>>>>> during parsing to reduce the used memory by using the method
>>>>>>>>>>> startElement() of the DOMLSParserFilter class. After that i
>>>>>>>>>>>                      
>> would
>>    
>>>>>>>>>>> use a SAX parser and just get all XML-Elements DATA with their
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>> values.
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>>>>> But it does not work.
>>>>>>>>>>> I integregated my code into the DOMPrint example which comes
>>>>>>>>>>>                      
>> along
>>    
>>>>>>>>>>> with Xercesc C++ 3.0.1. The following error message occurred:
>>>>>>>>>>>
>>>>>>>>>>> DOM Error during parsing:
>>>>>>>>>>>      
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
>>    
>>>>    
>>>>        
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>>>    
>>>>>>>>>>        
>>>>>>>>>>            
>>>>>>>>>>                
>>>>>>>>>>                    
>>>>>>>>>>> DOMException code is:  3
>>>>>>>>>>> Message is: attempt is made to insert a node where it is not
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>> permitted
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>>>>> Did i misunderstand the functionality of the DOMLSParserFilter
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>> class
>>>>    
>>>>        
>>>>>>>>>>> and its method startElement?
>>>>>>>>>>> It is possible to realize my idea with the help of this class?
>>>>>>>>>>>                      
>> Did
>>    
>>>>>>>>>>> i something wrong with in my code (please have a look below)?
>>>>>>>>>>>
>>>>>>>>>>> I would be very grateful for any help.
>>>>>>>>>>>
>>>>>>>>>>> Thanks in advanced,
>>>>>>>>>>> Mirko
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> DOMPrintFilter.hpp:
>>>>>>>>>>> --------------------
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> class DOMParserFilter : public DOMLSParserFilter {
>>>>>>>>>>> public:
>>>>>>>>>>>
>>>>>>>>>>>   DOMParserFilter(DOMNodeFilter::ShowType whatToShow =
>>>>>>>>>>>      
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>>>>>> DOMNodeFilter::SHOW_ALL);
>>>>>>>>>>    
>>>>>>>>>>        
>>>>>>>>>>            
>>>>>>>>>>                
>>>>>>>>>>                    
>>>>>>>>>>>     ~DOMParserFilter(){};
>>>>>>>>>>>
>>>>>>>>>>>     virtual FilterAction startElement(DOMElement* node);
>>>>>>>>>>>     virtual FilterAction acceptNode(DOMNode* node){return
>>>>>>>>>>>      
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>>>>>> DOMParserFilter::FILTER_ACCEPT;};
>>>>>>>>>>    
>>>>>>>>>>        
>>>>>>>>>>            
>>>>>>>>>>                
>>>>>>>>>>                    
>>>>>>>>>>>     virtual DOMNodeFilter::ShowType getWhatToShow() const
>>>>>>>>>>>                      
>> {return
>>    
>>>>>>>>>>>      
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>>>>>> fWhatToShow;};
>>>>>>>>>>    
>>>>>>>>>>        
>>>>>>>>>>            
>>>>>>>>>>                
>>>>>>>>>>                    
>>>>>>>>>>> private:
>>>>>>>>>>>     DOMNodeFilter::ShowType fWhatToShow;
>>>>>>>>>>> };
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> DOMPrintFilter.cpp:
>>>>>>>>>>> --------------------
>>>>>>>>>>>
>>>>>>>>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>> whatToShow)
>>>>    
>>>>        
>>>>>>>>>>> :fWhatToShow(whatToShow)
>>>>>>>>>>> {}
>>>>>>>>>>>
>>>>>>>>>>> DOMParserFilter::FilterAction
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>>>> DOMParserFilter::startElement(DOMElement*
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>>>>      
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>>>>>> node)
>>>>>>>>>>    
>>>>>>>>>>        
>>>>>>>>>>            
>>>>>>>>>>                
>>>>>>>>>>                    
>>>>>>>>>>> {
>>>>>>>>>>>   // for element whose name is "DATA", skip it
>>>>>>>>>>>   if (XMLString::compareString(node->getNodeName(),
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>> element_data)==0)
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>>>>>     return DOMParserFilter::FILTER_REJECT;
>>>>>>>>>>>   else
>>>>>>>>>>>     return DOMParserFilter::FILTER_ACCEPT;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> DOMPrint.cpp:
>>>>>>>>>>> ---------------
>>>>>>>>>>>
>>>>>>>>>>> static const XMLCh gLS[] = { xercesc::chLatin_L,
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>> xercesc::chLatin_S,
>>>>    
>>>>        
>>>>>>>>>>>      
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>>>>>> xercesc::chNull };
>>>>>>>>>>    
>>>>>>>>>>        
>>>>>>>>>>            
>>>>>>>>>>                
>>>>>>>>>>                    
>>>>>>>>>>> xercesc::DOMImplementation *implParser =
>>>>>>>>>>>      
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>>>>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS);
>>>>>>>>>>    
>>>>>>>>>>        
>>>>>>>>>>            
>>>>>>>>>>                
>>>>>>>>>>                    
>>>>>>>>>>> xercesc::DOMLSParser* parser =
>>>>>>>>>>>      
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0);
>>    
>>>>    
>>>>        
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>>>    
>>>>>>>>>>        
>>>>>>>>>>            
>>>>>>>>>>                
>>>>>>>>>>                    
>>>>>>>>>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
>>>>>>>>>>>
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler,
>>    
>>>>    
>>>>        
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>>>>      
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>>>>>> errReporter);
>>>>>>>>>>    
>>>>>>>>>>        
>>>>>>>>>>            
>>>>>>>>>>                
>>>>>>>>>>                    
>>>>>>>>>>>    
>>>>>>>>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter();
>>>>>>>>>>> parser->setFilter(pDOMParserFilter);
>>>>>>>>>>>    
>>>>>>>>>>>
>>>>>>>>>>>     //
>>>>>>>>>>>     //  Parse the XML file, catching any XML exceptions that
>>>>>>>>>>>                      
>> might
>>    
>>>>>>>>>>>      
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>>>>>> propogate
>>>>>>>>>>    
>>>>>>>>>>        
>>>>>>>>>>            
>>>>>>>>>>                
>>>>>>>>>>                    
>>>>>>>>>>>     //  out of it.
>>>>>>>>>>>     //
>>>>>>>>>>>     bool errorsOccured = false;
>>>>>>>>>>>     DOMDocument *doc = NULL;
>>>>>>>>>>>
>>>>>>>>>>>     try
>>>>>>>>>>>     {
>>>>>>>>>>>       doc = parser->parseURI(gXmlFile);
>>>>>>>>>>>     }
>>>>>>>>>>>     catch (const OutOfMemoryException&)
>>>>>>>>>>>     {
>>>>>>>>>>>         XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" <<
>>>>>>>>>>>      
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>>>>>> XERCES_STD_QUALIFIER endl;
>>>>>>>>>>    
>>>>>>>>>>        
>>>>>>>>>>            
>>>>>>>>>>                
>>>>>>>>>>                    
>>>>>>>>>>>         errorsOccured = true;
>>>>>>>>>>>     }
>>>>>>>>>>>     catch (const XMLException& e)
>>>>>>>>>>>     {
>>>>>>>>>>>         XERCES_STD_QUALIFIER cerr << "An error occurred during
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>>>> parsing\n
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>>>>      
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>>>>>>   Message: "
>>>>>>>>>>    
>>>>>>>>>>        
>>>>>>>>>>            
>>>>>>>>>>                
>>>>>>>>>>                    
>>>>>>>>>>>              << StrX(e.getMessage()) << XERCES_STD_QUALIFIER
>>>>>>>>>>>                      
>> endl;
>>    
>>>>>>>>>>>         errorsOccured = true;
>>>>>>>>>>>     }
>>>>>>>>>>>
>>>>>>>>>>>     catch (const DOMException& e)
>>>>>>>>>>>     {
>>>>>>>>>>>       const unsigned int maxChars = 2047;
>>>>>>>>>>>       XMLCh errText[maxChars + 1];
>>>>>>>>>>>
>>>>>>>>>>>       XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing:
>>>>>>>>>>>                      
>> '"
>>    
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>> <<
>>>>>>    
>>>>>>        
>>>>>>            
>>>>>>>>>>>      
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>>>>>> gXmlFile << "'\n"
>>>>>>>>>>    
>>>>>>>>>>        
>>>>>>>>>>            
>>>>>>>>>>                
>>>>>>>>>>                    
>>>>>>>>>>>            << "DOMException code is:  " << e.code <<
>>>>>>>>>>>      
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>>>>>> XERCES_STD_QUALIFIER endl;
>>>>>>>>>>    
>>>>>>>>>>        
>>>>>>>>>>            
>>>>>>>>>>                
>>>>>>>>>>                    
>>>>>>>>>>>       if (DOMImplementation::loadDOMExceptionMsg(e.code,
>>>>>>>>>>>                      
>> errText,
>>    
>>>>>>>>>>>      
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>>>>>> maxChars))
>>>>>>>>>>    
>>>>>>>>>>        
>>>>>>>>>>            
>>>>>>>>>>                
>>>>>>>>>>                    
>>>>>>>>>>>            XERCES_STD_QUALIFIER cerr << "Message is: " <<
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>>>> StrX(errText)
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>>>>      
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>>>>>> << XERCES_STD_QUALIFIER endl;
>>>>>>>>>>    
>>>>>>>>>>        
>>>>>>>>>>            
>>>>>>>>>>                
>>>>>>>>>>                    
>>>>>>>>>>>       errorsOccured = true;
>>>>>>>>>>>     }
>>>>>>>>>>>
>>>>>>>>>>>     catch (...)
>>>>>>>>>>>     {
>>>>>>>>>>>         XERCES_STD_QUALIFIER cerr << "An error occurred during
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>>>> parsing\n
>>>>>>>>    
>>>>>>>>        
>>>>>>>>            
>>>>>>>>                
>>>>>>>>>>>      
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>>>>>> " << XERCES_STD_QUALIFIER endl;
>>>>>>>>>>    
>>>>>>>>>>        
>>>>>>>>>>            
>>>>>>>>>>                
>>>>>>>>>>                    
>>>>>>>>>>>         errorsOccured = true;
>>>>>>>>>>>     }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  
>>>>>>>>>>>      
>>>>>>>>>>>          
>>>>>>>>>>>              
>>>>>>>>>>>                  
>>>>>>>>>>>                      
>>>>>>>>>  
>>>>>>>>>      
>>>>>>>>>          
>>>>>>>>>              
>>>>>>>>>                  
>>>>>>>  
>>>>>>>      
>>>>>>>          
>>>>>>>              
>>>>>  
>>>>>      
>>>>>          
>>>  
>>>      
>
>  


Re: method startElement() from class DOMLSParserFilter

by Mirko Braun-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Hi Alberto,

did you have the time to check "if node texts do the same
with the buffer used to keep the node value, and how they are recycled
(i.e. if the big buffer used by DATA nodes is reused for a much smaller
node)"?

Mirko

-------- Original-Nachricht --------
> Datum: Tue, 08 Sep 2009 09:37:52 +0200
> Von: Alberto Massari <amassari@...>
> An: c-users@...
> Betreff: Re: method startElement() from class DOMLSParserFilter

> When you call release() on a node, the node is not deleted (as its
> memory comes from a pool that can be deleted as a whole) but it's placed
> in a "recycle bin" from where it is taken when a new node of the same
> type is requested. So, the next element will not allocate extra memory,
> but reuse that node. What I need to check is if node texts do the same
> with the buffer used to keep the node value, and how they are recycled
> (i.e. if the big buffer used by DATA nodes is reused for a much smaller
> node)
>
> Alberto
>
> Mirko Braun wrote:
> > Sorry, I don't know how much memory is used. I just had a look at the
> > maximum used memory in the task manager (Window XP). It doesn't
> > matter if i used a DOMLSParserFilter or not the process DOMPrint.exe
> used the same size of memory.
> > The XML-Elements DATA which i want to reject have very large values
> > and i think if i reject these nodes they are also removed from
> > memory. Does "be marked for recycling" mean, that these DATA nodes
> > remain in memory?
> >
> > Mirko
> >
> > -------- Original-Nachricht --------
> >  
> >> Datum: Mon, 07 Sep 2009 09:26:05 +0200
> >> Von: Alberto Massari <amassari@...>
> >> An: c-users@...
> >> Betreff: Re: method startElement() from class DOMLSParserFilter
> >>    
> >
> >  
> >> Mirko Braun wrote:
> >>    
> >>> Hi Alberto,
> >>>
> >>> thank you very much for your help. I integrated the patch in
> >>> 3.0.1 and it worked. There is no exception any more.
> >>> But there is still one problem. The usage of memory is still
> >>> of the same size. I think if a node is rejected from the tree
> >>> the usage of memory should also decrease. Is my conclusion
> >>> correct?
> >>>  
> >>>      
> >> Yes, if a node is rejected is should be marked for recycling; how much
> >> memory are you seeing is been used?
> >>
> >> Alberto
> >>
> >>    
> >>> Mirko
> >>>
> >>> -------- Original-Nachricht --------
> >>>  
> >>>      
> >>>> Datum: Fri, 04 Sep 2009 16:12:16 +0200
> >>>> Von: Alberto Massari <amassari@...>
> >>>> An: c-users@...
> >>>> Betreff: Re: method startElement() from class DOMLSParserFilter
> >>>>    
> >>>>        
> >>>  
> >>>      
> >>>> In effect I am seeing so many problems with that code that the only
> >>>> suggestion I have is to get the latest 3.0 from the trunk and work
> with
> >>>> what I have just committed (or get the patch from
> >>>> http://svn.apache.org/viewvc?rev=811420&view=rev and apply to the
> 3.0.1
> >>>> code). This version should support your original code.
> >>>>
> >>>> Alberto
> >>>>
> >>>>
> >>>> Mirko Braun wrote:
> >>>>    
> >>>>        
> >>>>> Hi Alberto,
> >>>>>
> >>>>> yes, i'm still using the method startElement(). Is it better
> >>>>> to use the method acceptNode() to reject the DATA node from
> >>>>> the DOM or is there any other possibility?
> >>>>>
> >>>>> Mirko
> >>>>>
> >>>>>
> >>>>> -------- Original-Nachricht --------
> >>>>>  
> >>>>>      
> >>>>>          
> >>>>>> Datum: Fri, 04 Sep 2009 15:41:54 +0200
> >>>>>> Von: Alberto Massari <amassari@...>
> >>>>>> An: c-users@...
> >>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>  
> >>>>>      
> >>>>>          
> >>>>>> Hi Mirko,
> >>>>>> are you still using startElement()? That API would mess with the
> >>>>>>        
> >>>>>>            
> >>>> current
> >>>>    
> >>>>        
> >>>>>> parent, so it would break the parsing at a certain point.
> >>>>>>
> >>>>>> Alberto
> >>>>>>
> >>>>>> Mirko Braun wrote:
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>> Hi Alberto,
> >>>>>>>
> >>>>>>> yes i'm sure that DATA is not a root node. I debugged a little
> bit.
> >>>>>>> The exception occurs after the sixth time this DATA node was
> found.
> >>>>>>>
> >>>>>>> Mirko
> >>>>>>>
> >>>>>>> -------- Original-Nachricht --------
> >>>>>>>  
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>>>> Datum: Fri, 04 Sep 2009 14:21:15 +0200
> >>>>>>>> Von: Alberto Massari <amassari@...>
> >>>>>>>> An: c-users@...
> >>>>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>  
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>>>> Hi Mirko,
> >>>>>>>> are you sure that your root node isn't one of those DATA
> elements?
> >>>>>>>>                
> >> In
> >>    
> >>>>>>>> this case the document node would see more than one root element.
> >>>>>>>>
> >>>>>>>> Alberto
> >>>>>>>>
> >>>>>>>> Mirko Braun wrote:
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>> Hi Alberto,
> >>>>>>>>>
> >>>>>>>>> thank you for you answer. I integrated the changes you
> >>>>>>>>> suggested, but the result is still the same:
> >>>>>>>>>
> >>>>>>>>> DOM Error during parsing:
> >>>>>>>>>
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>
> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
> >>    
> >>>>    
> >>>>        
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>> DOMException code is:  3
> >>>>>>>>> Message is: attempt is made to insert a node where it is not
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>> permitted
> >>>>    
> >>>>        
> >>>>>>>>> Best regards,
> >>>>>>>>> Mirko
> >>>>>>>>>
> >>>>>>>>> -------- Original-Nachricht --------
> >>>>>>>>>  
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>>>>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200
> >>>>>>>>>> Von: Alberto Massari <amassari@...>
> >>>>>>>>>> An: c-users@...
> >>>>>>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
> >>>>>>>>>>    
> >>>>>>>>>>        
> >>>>>>>>>>            
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>>>>>>>  
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>>>>>> Hi Mirko,
> >>>>>>>>>> I think the current implementation of the DOMLSParserFilter
> >>>>>>>>>>                    
> >> doesn't
> >>    
> >>>>>>>>>>        
> >>>>>>>>>>            
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>>>>>> work
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>>> nicely with your code, as the rejected nodes are not recycled
> and
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>> the
> >>>>    
> >>>>        
> >>>>>>>>>> memory will grow to the same level as before.
> >>>>>>>>>> Anyhow, you should instead override acceptNode like this:
> >>>>>>>>>>
> >>>>>>>>>> DOMParserFilter::FilterAction
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>> DOMParserFilter::acceptNode(DOMElement*
> >>>>    
> >>>>        
> >>>>>>>>>> node)
> >>>>>>>>>> {
> >>>>>>>>>>   // for element whose name is "DATA", skip it
> >>>>>>>>>>    if (node->getNodeType()==DOMNode::ELEMENT_NODE &&
> >>>>>>>>>> XMLString::compareString(node->getNodeName(), element_data)==0)
> >>>>>>>>>>      return DOMParserFilter::FILTER_REJECT;
> >>>>>>>>>>   else
> >>>>>>>>>>     return DOMParserFilter::FILTER_ACCEPT;
> >>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> Then, change DOMLSParserImpl::endElement to add a call to
> >>>>>>>>>> origNode->release() after the call to removeChild().
> >>>>>>>>>>
> >>>>>>>>>> Alberto
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Mirko Braun wrote:
> >>>>>>>>>>    
> >>>>>>>>>>        
> >>>>>>>>>>            
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>>>>>>>>> Hello everybody,
> >>>>>>>>>>>
> >>>>>>>>>>> i would like to parse a quite large XML file (about 180 MB).
> >>>>>>>>>>> I used the DOM interface because i need the tree for further
> >>>>>>>>>>> processing of the data the xml file contains. Of course there
> >>>>>>>>>>> is a lot of memory used during parsing the file and i got an
> >>>>>>>>>>> "Out of memory" exception.
> >>>>>>>>>>>
> >>>>>>>>>>> I noticed that a class DOMLSParserFilter comes along wiht
> >>>>>>>>>>>                      
> >> Xercesc
> >>    
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>> C++
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>>>>>      
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>>>>>> 3.0.1 (Win32), which makes it possible to filter the Nodes
> during
> >>>>>>>>>>        
> >>>>>>>>>>            
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>>>>>> parsing.
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>>>    
> >>>>>>>>>>        
> >>>>>>>>>>            
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>>>>>>>>> That is perfect for me because one XML-Element in my large
> file
> >>>>>>>>>>> contains most of the data. This XML-Element is called DATA and
> >>>>>>>>>>> appears serveral time in my XML file.
> >>>>>>>>>>> So i had the idea to reject this XML-Element from the DOM tree
> >>>>>>>>>>> during parsing to reduce the used memory by using the method
> >>>>>>>>>>> startElement() of the DOMLSParserFilter class. After that i
> >>>>>>>>>>>                      
> >> would
> >>    
> >>>>>>>>>>> use a SAX parser and just get all XML-Elements DATA with their
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>> values.
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>>>>> But it does not work.
> >>>>>>>>>>> I integregated my code into the DOMPrint example which comes
> >>>>>>>>>>>                      
> >> along
> >>    
> >>>>>>>>>>> with Xercesc C++ 3.0.1. The following error message occurred:
> >>>>>>>>>>>
> >>>>>>>>>>> DOM Error during parsing:
> >>>>>>>>>>>      
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>
> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
> >>    
> >>>>    
> >>>>        
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>>>    
> >>>>>>>>>>        
> >>>>>>>>>>            
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>>>>>>>>> DOMException code is:  3
> >>>>>>>>>>> Message is: attempt is made to insert a node where it is not
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>> permitted
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>>>>> Did i misunderstand the functionality of the DOMLSParserFilter
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>> class
> >>>>    
> >>>>        
> >>>>>>>>>>> and its method startElement?
> >>>>>>>>>>> It is possible to realize my idea with the help of this class?
> >>>>>>>>>>>                      
> >> Did
> >>    
> >>>>>>>>>>> i something wrong with in my code (please have a look below)?
> >>>>>>>>>>>
> >>>>>>>>>>> I would be very grateful for any help.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks in advanced,
> >>>>>>>>>>> Mirko
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> DOMPrintFilter.hpp:
> >>>>>>>>>>> --------------------
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> class DOMParserFilter : public DOMLSParserFilter {
> >>>>>>>>>>> public:
> >>>>>>>>>>>
> >>>>>>>>>>>   DOMParserFilter(DOMNodeFilter::ShowType whatToShow =
> >>>>>>>>>>>      
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>>>>>> DOMNodeFilter::SHOW_ALL);
> >>>>>>>>>>    
> >>>>>>>>>>        
> >>>>>>>>>>            
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>>>>>>>>>     ~DOMParserFilter(){};
> >>>>>>>>>>>
> >>>>>>>>>>>     virtual FilterAction startElement(DOMElement* node);
> >>>>>>>>>>>     virtual FilterAction acceptNode(DOMNode* node){return
> >>>>>>>>>>>      
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>>>>>> DOMParserFilter::FILTER_ACCEPT;};
> >>>>>>>>>>    
> >>>>>>>>>>        
> >>>>>>>>>>            
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>>>>>>>>>     virtual DOMNodeFilter::ShowType getWhatToShow() const
> >>>>>>>>>>>                      
> >> {return
> >>    
> >>>>>>>>>>>      
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>>>>>> fWhatToShow;};
> >>>>>>>>>>    
> >>>>>>>>>>        
> >>>>>>>>>>            
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>>>>>>>>> private:
> >>>>>>>>>>>     DOMNodeFilter::ShowType fWhatToShow;
> >>>>>>>>>>> };
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> DOMPrintFilter.cpp:
> >>>>>>>>>>> --------------------
> >>>>>>>>>>>
> >>>>>>>>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>> whatToShow)
> >>>>    
> >>>>        
> >>>>>>>>>>> :fWhatToShow(whatToShow)
> >>>>>>>>>>> {}
> >>>>>>>>>>>
> >>>>>>>>>>> DOMParserFilter::FilterAction
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>>>> DOMParserFilter::startElement(DOMElement*
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>>>>      
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>>>>>> node)
> >>>>>>>>>>    
> >>>>>>>>>>        
> >>>>>>>>>>            
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>>>>>>>>> {
> >>>>>>>>>>>   // for element whose name is "DATA", skip it
> >>>>>>>>>>>   if (XMLString::compareString(node->getNodeName(),
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>> element_data)==0)
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>>>>>     return DOMParserFilter::FILTER_REJECT;
> >>>>>>>>>>>   else
> >>>>>>>>>>>     return DOMParserFilter::FILTER_ACCEPT;
> >>>>>>>>>>> }
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> DOMPrint.cpp:
> >>>>>>>>>>> ---------------
> >>>>>>>>>>>
> >>>>>>>>>>> static const XMLCh gLS[] = { xercesc::chLatin_L,
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>> xercesc::chLatin_S,
> >>>>    
> >>>>        
> >>>>>>>>>>>      
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>>>>>> xercesc::chNull };
> >>>>>>>>>>    
> >>>>>>>>>>        
> >>>>>>>>>>            
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>>>>>>>>> xercesc::DOMImplementation *implParser =
> >>>>>>>>>>>      
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>>>>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS);
> >>>>>>>>>>    
> >>>>>>>>>>        
> >>>>>>>>>>            
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>>>>>>>>> xercesc::DOMLSParser* parser =
> >>>>>>>>>>>      
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>
> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0);
> >>    
> >>>>    
> >>>>        
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>>>    
> >>>>>>>>>>        
> >>>>>>>>>>            
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>>>>>>>>> DOMTreeErrorReporter *errReporter = new
> DOMTreeErrorReporter();
> >>>>>>>>>>>
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>
> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler,
> >>    
> >>>>    
> >>>>        
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>>>>      
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>>>>>> errReporter);
> >>>>>>>>>>    
> >>>>>>>>>>        
> >>>>>>>>>>            
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>>>>>>>>>    
> >>>>>>>>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter();
> >>>>>>>>>>> parser->setFilter(pDOMParserFilter);
> >>>>>>>>>>>    
> >>>>>>>>>>>
> >>>>>>>>>>>     //
> >>>>>>>>>>>     //  Parse the XML file, catching any XML exceptions that
> >>>>>>>>>>>                      
> >> might
> >>    
> >>>>>>>>>>>      
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>>>>>> propogate
> >>>>>>>>>>    
> >>>>>>>>>>        
> >>>>>>>>>>            
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>>>>>>>>>     //  out of it.
> >>>>>>>>>>>     //
> >>>>>>>>>>>     bool errorsOccured = false;
> >>>>>>>>>>>     DOMDocument *doc = NULL;
> >>>>>>>>>>>
> >>>>>>>>>>>     try
> >>>>>>>>>>>     {
> >>>>>>>>>>>       doc = parser->parseURI(gXmlFile);
> >>>>>>>>>>>     }
> >>>>>>>>>>>     catch (const OutOfMemoryException&)
> >>>>>>>>>>>     {
> >>>>>>>>>>>         XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" <<
> >>>>>>>>>>>      
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>>>>>> XERCES_STD_QUALIFIER endl;
> >>>>>>>>>>    
> >>>>>>>>>>        
> >>>>>>>>>>            
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>>>>>>>>>         errorsOccured = true;
> >>>>>>>>>>>     }
> >>>>>>>>>>>     catch (const XMLException& e)
> >>>>>>>>>>>     {
> >>>>>>>>>>>         XERCES_STD_QUALIFIER cerr << "An error occurred during
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>>>> parsing\n
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>>>>      
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>>>>>>   Message: "
> >>>>>>>>>>    
> >>>>>>>>>>        
> >>>>>>>>>>            
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>>>>>>>>>              << StrX(e.getMessage()) << XERCES_STD_QUALIFIER
> >>>>>>>>>>>                      
> >> endl;
> >>    
> >>>>>>>>>>>         errorsOccured = true;
> >>>>>>>>>>>     }
> >>>>>>>>>>>
> >>>>>>>>>>>     catch (const DOMException& e)
> >>>>>>>>>>>     {
> >>>>>>>>>>>       const unsigned int maxChars = 2047;
> >>>>>>>>>>>       XMLCh errText[maxChars + 1];
> >>>>>>>>>>>
> >>>>>>>>>>>       XERCES_STD_QUALIFIER cerr << "\nDOM Error during
> parsing:
> >>>>>>>>>>>                      
> >> '"
> >>    
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>> <<
> >>>>>>    
> >>>>>>        
> >>>>>>            
> >>>>>>>>>>>      
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>>>>>> gXmlFile << "'\n"
> >>>>>>>>>>    
> >>>>>>>>>>        
> >>>>>>>>>>            
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>>>>>>>>>            << "DOMException code is:  " << e.code <<
> >>>>>>>>>>>      
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>>>>>> XERCES_STD_QUALIFIER endl;
> >>>>>>>>>>    
> >>>>>>>>>>        
> >>>>>>>>>>            
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>>>>>>>>>       if (DOMImplementation::loadDOMExceptionMsg(e.code,
> >>>>>>>>>>>                      
> >> errText,
> >>    
> >>>>>>>>>>>      
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>>>>>> maxChars))
> >>>>>>>>>>    
> >>>>>>>>>>        
> >>>>>>>>>>            
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>>>>>>>>>            XERCES_STD_QUALIFIER cerr << "Message is: " <<
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>>>> StrX(errText)
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>>>>      
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>>>>>> << XERCES_STD_QUALIFIER endl;
> >>>>>>>>>>    
> >>>>>>>>>>        
> >>>>>>>>>>            
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>>>>>>>>>       errorsOccured = true;
> >>>>>>>>>>>     }
> >>>>>>>>>>>
> >>>>>>>>>>>     catch (...)
> >>>>>>>>>>>     {
> >>>>>>>>>>>         XERCES_STD_QUALIFIER cerr << "An error occurred during
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>>>> parsing\n
> >>>>>>>>    
> >>>>>>>>        
> >>>>>>>>            
> >>>>>>>>                
> >>>>>>>>>>>      
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>>>>>> " << XERCES_STD_QUALIFIER endl;
> >>>>>>>>>>    
> >>>>>>>>>>        
> >>>>>>>>>>            
> >>>>>>>>>>                
> >>>>>>>>>>                    
> >>>>>>>>>>>         errorsOccured = true;
> >>>>>>>>>>>     }
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>  
> >>>>>>>>>>>      
> >>>>>>>>>>>          
> >>>>>>>>>>>              
> >>>>>>>>>>>                  
> >>>>>>>>>>>                      
> >>>>>>>>>  
> >>>>>>>>>      
> >>>>>>>>>          
> >>>>>>>>>              
> >>>>>>>>>                  
> >>>>>>>  
> >>>>>>>      
> >>>>>>>          
> >>>>>>>              
> >>>>>  
> >>>>>      
> >>>>>          
> >>>  
> >>>      
> >
> >  

Re: method startElement() from class DOMLSParserFilter

by Alberto Massari :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Mirko,
sorry for the late answer; the DOM document is reusing that text
fragment, but it doesn't try to use it for a similarly sized string. So,
it gets reused immediately, maybe to store just a couple of characters
(and that doesn't help reducing the memory footprint).

Alberto

Mirko Braun wrote:

> Hi Alberto,
>
> did you have the time to check "if node texts do the same
> with the buffer used to keep the node value, and how they are recycled
> (i.e. if the big buffer used by DATA nodes is reused for a much smaller
> node)"?
>
> Mirko
>
> -------- Original-Nachricht --------
>  
>> Datum: Tue, 08 Sep 2009 09:37:52 +0200
>> Von: Alberto Massari <amassari@...>
>> An: c-users@...
>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>    
>
>  
>> When you call release() on a node, the node is not deleted (as its
>> memory comes from a pool that can be deleted as a whole) but it's placed
>> in a "recycle bin" from where it is taken when a new node of the same
>> type is requested. So, the next element will not allocate extra memory,
>> but reuse that node. What I need to check is if node texts do the same
>> with the buffer used to keep the node value, and how they are recycled
>> (i.e. if the big buffer used by DATA nodes is reused for a much smaller
>> node)
>>
>> Alberto
>>
>> Mirko Braun wrote:
>>    
>>> Sorry, I don't know how much memory is used. I just had a look at the
>>> maximum used memory in the task manager (Window XP). It doesn't
>>> matter if i used a DOMLSParserFilter or not the process DOMPrint.exe
>>>      
>> used the same size of memory.
>>    
>>> The XML-Elements DATA which i want to reject have very large values
>>> and i think if i reject these nodes they are also removed from
>>> memory. Does "be marked for recycling" mean, that these DATA nodes
>>> remain in memory?
>>>
>>> Mirko
>>>
>>> -------- Original-Nachricht --------
>>>  
>>>      
>>>> Datum: Mon, 07 Sep 2009 09:26:05 +0200
>>>> Von: Alberto Massari <amassari@...>
>>>> An: c-users@...
>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>>>    
>>>>        
>>>  
>>>      
>>>> Mirko Braun wrote:
>>>>    
>>>>        
>>>>> Hi Alberto,
>>>>>
>>>>> thank you very much for your help. I integrated the patch in
>>>>> 3.0.1 and it worked. There is no exception any more.
>>>>> But there is still one problem. The usage of memory is still
>>>>> of the same size. I think if a node is rejected from the tree
>>>>> the usage of memory should also decrease. Is my conclusion
>>>>> correct?
>>>>>  
>>>>>      
>>>>>          
>>>> Yes, if a node is rejected is should be marked for recycling; how much
>>>> memory are you seeing is been used?
>>>>
>>>> Alberto
>>>>        


RE: method startElement() from class DOMLSParserFilter

by John Lilley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some suggestions...  if you do not require the DOM itself, you might use the SAX parser interface.  It is not really much harder than the DOM interface although it takes some getting used to the method-callback mechanism.  Alternatively, if it is OK to use the memory temporarily, you could deep-copy the filtered DOM to a new DOM and discard the original.

john

-----Original Message-----
From: Alberto Massari [mailto:amassari@...]
Sent: Tuesday, September 22, 2009 12:41 PM
To: c-users@...
Subject: Re: method startElement() from class DOMLSParserFilter

Hi Mirko,
sorry for the late answer; the DOM document is reusing that text
fragment, but it doesn't try to use it for a similarly sized string. So,
it gets reused immediately, maybe to store just a couple of characters
(and that doesn't help reducing the memory footprint).

Alberto

Mirko Braun wrote:

> Hi Alberto,
>
> did you have the time to check "if node texts do the same
> with the buffer used to keep the node value, and how they are recycled
> (i.e. if the big buffer used by DATA nodes is reused for a much smaller
> node)"?
>
> Mirko
>
> -------- Original-Nachricht --------
>  
>> Datum: Tue, 08 Sep 2009 09:37:52 +0200
>> Von: Alberto Massari <amassari@...>
>> An: c-users@...
>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>    
>
>  
>> When you call release() on a node, the node is not deleted (as its
>> memory comes from a pool that can be deleted as a whole) but it's placed
>> in a "recycle bin" from where it is taken when a new node of the same
>> type is requested. So, the next element will not allocate extra memory,
>> but reuse that node. What I need to check is if node texts do the same
>> with the buffer used to keep the node value, and how they are recycled
>> (i.e. if the big buffer used by DATA nodes is reused for a much smaller
>> node)
>>
>> Alberto
>>
>> Mirko Braun wrote:
>>    
>>> Sorry, I don't know how much memory is used. I just had a look at the
>>> maximum used memory in the task manager (Window XP). It doesn't
>>> matter if i used a DOMLSParserFilter or not the process DOMPrint.exe
>>>      
>> used the same size of memory.
>>    
>>> The XML-Elements DATA which i want to reject have very large values
>>> and i think if i reject these nodes they are also removed from
>>> memory. Does "be marked for recycling" mean, that these DATA nodes
>>> remain in memory?
>>>
>>> Mirko
>>>
>>> -------- Original-Nachricht --------
>>>  
>>>      
>>>> Datum: Mon, 07 Sep 2009 09:26:05 +0200
>>>> Von: Alberto Massari <amassari@...>
>>>> An: c-users@...
>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>>>    
>>>>        
>>>  
>>>      
>>>> Mirko Braun wrote:
>>>>    
>>>>        
>>>>> Hi Alberto,
>>>>>
>>>>> thank you very much for your help. I integrated the patch in
>>>>> 3.0.1 and it worked. There is no exception any more.
>>>>> But there is still one problem. The usage of memory is still
>>>>> of the same size. I think if a node is rejected from the tree
>>>>> the usage of memory should also decrease. Is my conclusion
>>>>> correct?
>>>>>  
>>>>>      
>>>>>          
>>>> Yes, if a node is rejected is should be marked for recycling; how much
>>>> memory are you seeing is been used?
>>>>
>>>> Alberto
>>>>        


How to get Xerces to recognize external entity callouts

by DeWayne :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello guys

I'm getting Xerces parse errors and I believe it is because the entity callouts can not be located(see below). How do I get Xerces to follow the URL in the entity callout to resolve this. I'm running Xerces 2.7.0 and not sure if this feature is supported. Do I need to upgrade Xerces?


Snippet of the xml file
--------------------------------------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE mpd SYSTEM "mpboe03.dtd" [
<!ENTITY % isobox PUBLIC "-//W3C//ENTITIES Box and Line Drawing//EN//XML" "http://www.w3.org/2003/entities/2007/isobox.ent" >
 %isobox;
<!ENTITY % isoamsc PUBLIC "-//W3C//ENTITIES Added Math Symbols: Delimiters//EN//XML" "http://www.w3.org/2003/entities/2007/isoamsc.ent" >
 %isoamsc;  


DeWayne Dantlzer


Re: How to get Xerces to recognize external entity callouts

by Alberto Massari :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,
how are you invoking the parsing? Maybe you disabled external enitity
resolution, or you didn't compile a NetAccessor inside Xerces.

Alberto

Dantzler, DeWayne C wrote:

> Hello guys
>
> I'm getting Xerces parse errors and I believe it is because the entity callouts can not be located(see below). How do I get Xerces to follow the URL in the entity callout to resolve this. I'm running Xerces 2.7.0 and not sure if this feature is supported. Do I need to upgrade Xerces?
>
>
> Snippet of the xml file
> --------------------------------------------------------------------------------------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE mpd SYSTEM "mpboe03.dtd" [
> <!ENTITY % isobox PUBLIC "-//W3C//ENTITIES Box and Line Drawing//EN//XML" "http://www.w3.org/2003/entities/2007/isobox.ent" >
>  %isobox;
> <!ENTITY % isoamsc PUBLIC "-//W3C//ENTITIES Added Math Symbols: Delimiters//EN//XML" "http://www.w3.org/2003/entities/2007/isoamsc.ent" >
>  %isoamsc;  
>
>
> DeWayne Dantlzer
>
>
>  

< Prev | 1 - 2 - 3 | Next >