|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 - 3 | Next > |
|
|
method startElement() from class DOMLSParserFilterHello everybody,
i would like to parse a quite large XML file (about 180 MB). I used the DOM interface because i need the tree for further processing of the data the xml file contains. Of course there is a lot of memory used during parsing the file and i got an "Out of memory" exception. I noticed that a class DOMLSParserFilter comes along wiht Xercesc C++ 3.0.1 (Win32), which makes it possible to filter the Nodes during parsing. That is perfect for me because one XML-Element in my large file contains most of the data. This XML-Element is called DATA and appears serveral time in my XML file. So i had the idea to reject this XML-Element from the DOM tree during parsing to reduce the used memory by using the method startElement() of the DOMLSParserFilter class. After that i would use a SAX parser and just get all XML-Elements DATA with their values. But it does not work. I integregated my code into the DOMPrint example which comes along with Xercesc C++ 3.0.1. The following error message occurred: DOM Error during parsing: 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' DOMException code is: 3 Message is: attempt is made to insert a node where it is not permitted Did i misunderstand the functionality of the DOMLSParserFilter class and its method startElement? It is possible to realize my idea with the help of this class? Did i something wrong with in my code (please have a look below)? I would be very grateful for any help. Thanks in advanced, Mirko DOMPrintFilter.hpp: -------------------- class DOMParserFilter : public DOMLSParserFilter { public: DOMParserFilter(DOMNodeFilter::ShowType whatToShow = DOMNodeFilter::SHOW_ALL); ~DOMParserFilter(){}; virtual FilterAction startElement(DOMElement* node); virtual FilterAction acceptNode(DOMNode* node){return DOMParserFilter::FILTER_ACCEPT;}; virtual DOMNodeFilter::ShowType getWhatToShow() const {return fWhatToShow;}; private: DOMNodeFilter::ShowType fWhatToShow; }; DOMPrintFilter.cpp: -------------------- DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow) :fWhatToShow(whatToShow) {} DOMParserFilter::FilterAction DOMParserFilter::startElement(DOMElement* node) { // for element whose name is "DATA", skip it if (XMLString::compareString(node->getNodeName(), element_data)==0) return DOMParserFilter::FILTER_REJECT; else return DOMParserFilter::FILTER_ACCEPT; } DOMPrint.cpp: --------------- static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S, xercesc::chNull }; xercesc::DOMImplementation *implParser = xercesc::DOMImplementationRegistry::getDOMImplementation(gLS); xercesc::DOMLSParser* parser = ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0); DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter(); parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler, errReporter); DOMParserFilter * pDOMParserFilter = new DOMParserFilter(); parser->setFilter(pDOMParserFilter); // // Parse the XML file, catching any XML exceptions that might propogate // out of it. // bool errorsOccured = false; DOMDocument *doc = NULL; try { doc = parser->parseURI(gXmlFile); } catch (const OutOfMemoryException&) { XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << XERCES_STD_QUALIFIER endl; errorsOccured = true; } catch (const XMLException& e) { XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n Message: " << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl; errorsOccured = true; } catch (const DOMException& e) { const unsigned int maxChars = 2047; XMLCh errText[maxChars + 1]; XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" << gXmlFile << "'\n" << "DOMException code is: " << e.code << XERCES_STD_QUALIFIER endl; if (DOMImplementation::loadDOMExceptionMsg(e.code, errText, maxChars)) XERCES_STD_QUALIFIER cerr << "Message is: " << StrX(errText) << XERCES_STD_QUALIFIER endl; errorsOccured = true; } catch (...) { XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n " << XERCES_STD_QUALIFIER endl; errorsOccured = true; } |
|
|
Re: method startElement() from class DOMLSParserFilterHi Mirko,
I think the current implementation of the DOMLSParserFilter doesn't work nicely with your code, as the rejected nodes are not recycled and the memory will grow to the same level as before. Anyhow, you should instead override acceptNode like this: DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement* node) { // for element whose name is "DATA", skip it if (node->getNodeType()==DOMNode::ELEMENT_NODE && XMLString::compareString(node->getNodeName(), element_data)==0) return DOMParserFilter::FILTER_REJECT; else return DOMParserFilter::FILTER_ACCEPT; } Then, change DOMLSParserImpl::endElement to add a call to origNode->release() after the call to removeChild(). Alberto Mirko Braun wrote: > Hello everybody, > > i would like to parse a quite large XML file (about 180 MB). > I used the DOM interface because i need the tree for further > processing of the data the xml file contains. Of course there > is a lot of memory used during parsing the file and i got an > "Out of memory" exception. > > I noticed that a class DOMLSParserFilter comes along wiht Xercesc C++ 3.0.1 (Win32), which makes it possible to filter the Nodes during parsing. > That is perfect for me because one XML-Element in my large file > contains most of the data. This XML-Element is called DATA and > appears serveral time in my XML file. > So i had the idea to reject this XML-Element from the DOM tree > during parsing to reduce the used memory by using the method > startElement() of the DOMLSParserFilter class. After that i would > use a SAX parser and just get all XML-Elements DATA with their values. > But it does not work. > I integregated my code into the DOMPrint example which comes along > with Xercesc C++ 3.0.1. The following error message occurred: > > DOM Error during parsing: 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > DOMException code is: 3 > Message is: attempt is made to insert a node where it is not permitted > > > Did i misunderstand the functionality of the DOMLSParserFilter class > and its method startElement? > It is possible to realize my idea with the help of this class? Did > i something wrong with in my code (please have a look below)? > > I would be very grateful for any help. > > Thanks in advanced, > Mirko > > > DOMPrintFilter.hpp: > -------------------- > > > class DOMParserFilter : public DOMLSParserFilter { > public: > > DOMParserFilter(DOMNodeFilter::ShowType whatToShow = DOMNodeFilter::SHOW_ALL); > ~DOMParserFilter(){}; > > virtual FilterAction startElement(DOMElement* node); > virtual FilterAction acceptNode(DOMNode* node){return DOMParserFilter::FILTER_ACCEPT;}; > virtual DOMNodeFilter::ShowType getWhatToShow() const {return fWhatToShow;}; > > private: > DOMNodeFilter::ShowType fWhatToShow; > }; > > > DOMPrintFilter.cpp: > -------------------- > > DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow) > :fWhatToShow(whatToShow) > {} > > DOMParserFilter::FilterAction DOMParserFilter::startElement(DOMElement* node) > { > // for element whose name is "DATA", skip it > if (XMLString::compareString(node->getNodeName(), element_data)==0) > return DOMParserFilter::FILTER_REJECT; > else > return DOMParserFilter::FILTER_ACCEPT; > } > > > DOMPrint.cpp: > --------------- > > static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S, xercesc::chNull }; > > xercesc::DOMImplementation *implParser = xercesc::DOMImplementationRegistry::getDOMImplementation(gLS); > > xercesc::DOMLSParser* parser = ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0); > > > > DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter(); > parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler, errReporter); > > DOMParserFilter * pDOMParserFilter = new DOMParserFilter(); > parser->setFilter(pDOMParserFilter); > > > // > // Parse the XML file, catching any XML exceptions that might propogate > // out of it. > // > bool errorsOccured = false; > DOMDocument *doc = NULL; > > try > { > doc = parser->parseURI(gXmlFile); > } > catch (const OutOfMemoryException&) > { > XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << XERCES_STD_QUALIFIER endl; > errorsOccured = true; > } > catch (const XMLException& e) > { > XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n Message: " > << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl; > errorsOccured = true; > } > > catch (const DOMException& e) > { > const unsigned int maxChars = 2047; > XMLCh errText[maxChars + 1]; > > XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" << gXmlFile << "'\n" > << "DOMException code is: " << e.code << XERCES_STD_QUALIFIER endl; > > if (DOMImplementation::loadDOMExceptionMsg(e.code, errText, maxChars)) > XERCES_STD_QUALIFIER cerr << "Message is: " << StrX(errText) << XERCES_STD_QUALIFIER endl; > > errorsOccured = true; > } > > catch (...) > { > XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n " << XERCES_STD_QUALIFIER endl; > errorsOccured = true; > } > > > > > |
|
|
Re: method startElement() from class DOMLSParserFilterHi Alberto, thank you for you answer. I integrated the changes you suggested, but the result is still the same: DOM Error during parsing: 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' DOMException code is: 3 Message is: attempt is made to insert a node where it is not permitted Best regards, Mirko -------- Original-Nachricht -------- > Datum: Fri, 04 Sep 2009 12:37:10 +0200 > Von: Alberto Massari <amassari@...> > An: c-users@... > Betreff: Re: method startElement() from class DOMLSParserFilter > Hi Mirko, > I think the current implementation of the DOMLSParserFilter doesn't work > nicely with your code, as the rejected nodes are not recycled and the > memory will grow to the same level as before. > Anyhow, you should instead override acceptNode like this: > > DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement* > node) > { > // for element whose name is "DATA", skip it > if (node->getNodeType()==DOMNode::ELEMENT_NODE && > XMLString::compareString(node->getNodeName(), element_data)==0) > return DOMParserFilter::FILTER_REJECT; > else > return DOMParserFilter::FILTER_ACCEPT; > } > > Then, change DOMLSParserImpl::endElement to add a call to > origNode->release() after the call to removeChild(). > > Alberto > > > Mirko Braun wrote: > > Hello everybody, > > > > i would like to parse a quite large XML file (about 180 MB). > > I used the DOM interface because i need the tree for further > > processing of the data the xml file contains. Of course there > > is a lot of memory used during parsing the file and i got an > > "Out of memory" exception. > > > > I noticed that a class DOMLSParserFilter comes along wiht Xercesc C++ > 3.0.1 (Win32), which makes it possible to filter the Nodes during parsing. > > That is perfect for me because one XML-Element in my large file > > contains most of the data. This XML-Element is called DATA and > > appears serveral time in my XML file. > > So i had the idea to reject this XML-Element from the DOM tree > > during parsing to reduce the used memory by using the method > > startElement() of the DOMLSParserFilter class. After that i would > > use a SAX parser and just get all XML-Elements DATA with their values. > > But it does not work. > > I integregated my code into the DOMPrint example which comes along > > with Xercesc C++ 3.0.1. The following error message occurred: > > > > DOM Error during parsing: > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > > DOMException code is: 3 > > Message is: attempt is made to insert a node where it is not permitted > > > > > > Did i misunderstand the functionality of the DOMLSParserFilter class > > and its method startElement? > > It is possible to realize my idea with the help of this class? Did > > i something wrong with in my code (please have a look below)? > > > > I would be very grateful for any help. > > > > Thanks in advanced, > > Mirko > > > > > > DOMPrintFilter.hpp: > > -------------------- > > > > > > class DOMParserFilter : public DOMLSParserFilter { > > public: > > > > DOMParserFilter(DOMNodeFilter::ShowType whatToShow = > DOMNodeFilter::SHOW_ALL); > > ~DOMParserFilter(){}; > > > > virtual FilterAction startElement(DOMElement* node); > > virtual FilterAction acceptNode(DOMNode* node){return > DOMParserFilter::FILTER_ACCEPT;}; > > virtual DOMNodeFilter::ShowType getWhatToShow() const {return > fWhatToShow;}; > > > > private: > > DOMNodeFilter::ShowType fWhatToShow; > > }; > > > > > > DOMPrintFilter.cpp: > > -------------------- > > > > DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow) > > :fWhatToShow(whatToShow) > > {} > > > > DOMParserFilter::FilterAction DOMParserFilter::startElement(DOMElement* > node) > > { > > // for element whose name is "DATA", skip it > > if (XMLString::compareString(node->getNodeName(), element_data)==0) > > return DOMParserFilter::FILTER_REJECT; > > else > > return DOMParserFilter::FILTER_ACCEPT; > > } > > > > > > DOMPrint.cpp: > > --------------- > > > > static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S, > xercesc::chNull }; > > > > xercesc::DOMImplementation *implParser = > xercesc::DOMImplementationRegistry::getDOMImplementation(gLS); > > > > xercesc::DOMLSParser* parser = > ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0); > > > > > > > > DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter(); > > parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler, > errReporter); > > > > DOMParserFilter * pDOMParserFilter = new DOMParserFilter(); > > parser->setFilter(pDOMParserFilter); > > > > > > // > > // Parse the XML file, catching any XML exceptions that might > propogate > > // out of it. > > // > > bool errorsOccured = false; > > DOMDocument *doc = NULL; > > > > try > > { > > doc = parser->parseURI(gXmlFile); > > } > > catch (const OutOfMemoryException&) > > { > > XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << > XERCES_STD_QUALIFIER endl; > > errorsOccured = true; > > } > > catch (const XMLException& e) > > { > > XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n > Message: " > > << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl; > > errorsOccured = true; > > } > > > > catch (const DOMException& e) > > { > > const unsigned int maxChars = 2047; > > XMLCh errText[maxChars + 1]; > > > > XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" << > gXmlFile << "'\n" > > << "DOMException code is: " << e.code << > XERCES_STD_QUALIFIER endl; > > > > if (DOMImplementation::loadDOMExceptionMsg(e.code, errText, > maxChars)) > > XERCES_STD_QUALIFIER cerr << "Message is: " << StrX(errText) > << XERCES_STD_QUALIFIER endl; > > > > errorsOccured = true; > > } > > > > catch (...) > > { > > XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n > " << XERCES_STD_QUALIFIER endl; > > errorsOccured = true; > > } > > > > > > > > > > |
|
|
RE: method startElement() from class DOMLSParserFilterForgive my ignorance, but could it be that you must reject not only the node you don't want, but all of its children as well?
john -----Original Message----- From: Mirko Braun [mailto:mirko.braun@...] Sent: Friday, September 04, 2009 6:01 AM To: c-users@... Subject: Re: method startElement() from class DOMLSParserFilter Hi Alberto, thank you for you answer. I integrated the changes you suggested, but the result is still the same: DOM Error during parsing: 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' DOMException code is: 3 Message is: attempt is made to insert a node where it is not permitted Best regards, Mirko -------- Original-Nachricht -------- > Datum: Fri, 04 Sep 2009 12:37:10 +0200 > Von: Alberto Massari <amassari@...> > An: c-users@... > Betreff: Re: method startElement() from class DOMLSParserFilter > Hi Mirko, > I think the current implementation of the DOMLSParserFilter doesn't work > nicely with your code, as the rejected nodes are not recycled and the > memory will grow to the same level as before. > Anyhow, you should instead override acceptNode like this: > > DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement* > node) > { > // for element whose name is "DATA", skip it > if (node->getNodeType()==DOMNode::ELEMENT_NODE && > XMLString::compareString(node->getNodeName(), element_data)==0) > return DOMParserFilter::FILTER_REJECT; > else > return DOMParserFilter::FILTER_ACCEPT; > } > > Then, change DOMLSParserImpl::endElement to add a call to > origNode->release() after the call to removeChild(). > > Alberto > > > Mirko Braun wrote: > > Hello everybody, > > > > i would like to parse a quite large XML file (about 180 MB). > > I used the DOM interface because i need the tree for further > > processing of the data the xml file contains. Of course there > > is a lot of memory used during parsing the file and i got an > > "Out of memory" exception. > > > > I noticed that a class DOMLSParserFilter comes along wiht Xercesc C++ > 3.0.1 (Win32), which makes it possible to filter the Nodes during parsing. > > That is perfect for me because one XML-Element in my large file > > contains most of the data. This XML-Element is called DATA and > > appears serveral time in my XML file. > > So i had the idea to reject this XML-Element from the DOM tree > > during parsing to reduce the used memory by using the method > > startElement() of the DOMLSParserFilter class. After that i would > > use a SAX parser and just get all XML-Elements DATA with their values. > > But it does not work. > > I integregated my code into the DOMPrint example which comes along > > with Xercesc C++ 3.0.1. The following error message occurred: > > > > DOM Error during parsing: > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > > DOMException code is: 3 > > Message is: attempt is made to insert a node where it is not permitted > > > > > > Did i misunderstand the functionality of the DOMLSParserFilter class > > and its method startElement? > > It is possible to realize my idea with the help of this class? Did > > i something wrong with in my code (please have a look below)? > > > > I would be very grateful for any help. > > > > Thanks in advanced, > > Mirko > > > > > > DOMPrintFilter.hpp: > > -------------------- > > > > > > class DOMParserFilter : public DOMLSParserFilter { > > public: > > > > DOMParserFilter(DOMNodeFilter::ShowType whatToShow = > DOMNodeFilter::SHOW_ALL); > > ~DOMParserFilter(){}; > > > > virtual FilterAction startElement(DOMElement* node); > > virtual FilterAction acceptNode(DOMNode* node){return > DOMParserFilter::FILTER_ACCEPT;}; > > virtual DOMNodeFilter::ShowType getWhatToShow() const {return > fWhatToShow;}; > > > > private: > > DOMNodeFilter::ShowType fWhatToShow; > > }; > > > > > > DOMPrintFilter.cpp: > > -------------------- > > > > DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow) > > :fWhatToShow(whatToShow) > > {} > > > > DOMParserFilter::FilterAction DOMParserFilter::startElement(DOMElement* > node) > > { > > // for element whose name is "DATA", skip it > > if (XMLString::compareString(node->getNodeName(), element_data)==0) > > return DOMParserFilter::FILTER_REJECT; > > else > > return DOMParserFilter::FILTER_ACCEPT; > > } > > > > > > DOMPrint.cpp: > > --------------- > > > > static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S, > xercesc::chNull }; > > > > xercesc::DOMImplementation *implParser = > xercesc::DOMImplementationRegistry::getDOMImplementation(gLS); > > > > xercesc::DOMLSParser* parser = > ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0); > > > > > > > > DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter(); > > parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler, > errReporter); > > > > DOMParserFilter * pDOMParserFilter = new DOMParserFilter(); > > parser->setFilter(pDOMParserFilter); > > > > > > // > > // Parse the XML file, catching any XML exceptions that might > propogate > > // out of it. > > // > > bool errorsOccured = false; > > DOMDocument *doc = NULL; > > > > try > > { > > doc = parser->parseURI(gXmlFile); > > } > > catch (const OutOfMemoryException&) > > { > > XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << > XERCES_STD_QUALIFIER endl; > > errorsOccured = true; > > } > > catch (const XMLException& e) > > { > > XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n > Message: " > > << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl; > > errorsOccured = true; > > } > > > > catch (const DOMException& e) > > { > > const unsigned int maxChars = 2047; > > XMLCh errText[maxChars + 1]; > > > > XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" << > gXmlFile << "'\n" > > << "DOMException code is: " << e.code << > XERCES_STD_QUALIFIER endl; > > > > if (DOMImplementation::loadDOMExceptionMsg(e.code, errText, > maxChars)) > > XERCES_STD_QUALIFIER cerr << "Message is: " << StrX(errText) > << XERCES_STD_QUALIFIER endl; > > > > errorsOccured = true; > > } > > > > catch (...) > > { > > XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n > " << XERCES_STD_QUALIFIER endl; > > errorsOccured = true; > > } > > > > > > > > > > |
|
|
Re: method startElement() from class DOMLSParserFilterHi Mirko,
are you sure that your root node isn't one of those DATA elements? In this case the document node would see more than one root element. Alberto Mirko Braun wrote: > Hi Alberto, > > thank you for you answer. I integrated the changes you > suggested, but the result is still the same: > > DOM Error during parsing: > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > DOMException code is: 3 > Message is: attempt is made to insert a node where it is not permitted > > Best regards, > Mirko > > -------- Original-Nachricht -------- > >> Datum: Fri, 04 Sep 2009 12:37:10 +0200 >> Von: Alberto Massari <amassari@...> >> An: c-users@... >> Betreff: Re: method startElement() from class DOMLSParserFilter >> > > >> Hi Mirko, >> I think the current implementation of the DOMLSParserFilter doesn't work >> nicely with your code, as the rejected nodes are not recycled and the >> memory will grow to the same level as before. >> Anyhow, you should instead override acceptNode like this: >> >> DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement* >> node) >> { >> // for element whose name is "DATA", skip it >> if (node->getNodeType()==DOMNode::ELEMENT_NODE && >> XMLString::compareString(node->getNodeName(), element_data)==0) >> return DOMParserFilter::FILTER_REJECT; >> else >> return DOMParserFilter::FILTER_ACCEPT; >> } >> >> Then, change DOMLSParserImpl::endElement to add a call to >> origNode->release() after the call to removeChild(). >> >> Alberto >> >> >> Mirko Braun wrote: >> >>> Hello everybody, >>> >>> i would like to parse a quite large XML file (about 180 MB). >>> I used the DOM interface because i need the tree for further >>> processing of the data the xml file contains. Of course there >>> is a lot of memory used during parsing the file and i got an >>> "Out of memory" exception. >>> >>> I noticed that a class DOMLSParserFilter comes along wiht Xercesc C++ >>> >> 3.0.1 (Win32), which makes it possible to filter the Nodes during parsing. >> >>> That is perfect for me because one XML-Element in my large file >>> contains most of the data. This XML-Element is called DATA and >>> appears serveral time in my XML file. >>> So i had the idea to reject this XML-Element from the DOM tree >>> during parsing to reduce the used memory by using the method >>> startElement() of the DOMLSParserFilter class. After that i would >>> use a SAX parser and just get all XML-Elements DATA with their values. >>> But it does not work. >>> I integregated my code into the DOMPrint example which comes along >>> with Xercesc C++ 3.0.1. The following error message occurred: >>> >>> DOM Error during parsing: >>> >> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' >> >>> DOMException code is: 3 >>> Message is: attempt is made to insert a node where it is not permitted >>> >>> >>> Did i misunderstand the functionality of the DOMLSParserFilter class >>> and its method startElement? >>> It is possible to realize my idea with the help of this class? Did >>> i something wrong with in my code (please have a look below)? >>> >>> I would be very grateful for any help. >>> >>> Thanks in advanced, >>> Mirko >>> >>> >>> DOMPrintFilter.hpp: >>> -------------------- >>> >>> >>> class DOMParserFilter : public DOMLSParserFilter { >>> public: >>> >>> DOMParserFilter(DOMNodeFilter::ShowType whatToShow = >>> >> DOMNodeFilter::SHOW_ALL); >> >>> ~DOMParserFilter(){}; >>> >>> virtual FilterAction startElement(DOMElement* node); >>> virtual FilterAction acceptNode(DOMNode* node){return >>> >> DOMParserFilter::FILTER_ACCEPT;}; >> >>> virtual DOMNodeFilter::ShowType getWhatToShow() const {return >>> >> fWhatToShow;}; >> >>> private: >>> DOMNodeFilter::ShowType fWhatToShow; >>> }; >>> >>> >>> DOMPrintFilter.cpp: >>> -------------------- >>> >>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow) >>> :fWhatToShow(whatToShow) >>> {} >>> >>> DOMParserFilter::FilterAction DOMParserFilter::startElement(DOMElement* >>> >> node) >> >>> { >>> // for element whose name is "DATA", skip it >>> if (XMLString::compareString(node->getNodeName(), element_data)==0) >>> return DOMParserFilter::FILTER_REJECT; >>> else >>> return DOMParserFilter::FILTER_ACCEPT; >>> } >>> >>> >>> DOMPrint.cpp: >>> --------------- >>> >>> static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S, >>> >> xercesc::chNull }; >> >>> xercesc::DOMImplementation *implParser = >>> >> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS); >> >>> xercesc::DOMLSParser* parser = >>> >> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0); >> >>> >>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter(); >>> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler, >>> >> errReporter); >> >>> >>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter(); >>> parser->setFilter(pDOMParserFilter); >>> >>> >>> // >>> // Parse the XML file, catching any XML exceptions that might >>> >> propogate >> >>> // out of it. >>> // >>> bool errorsOccured = false; >>> DOMDocument *doc = NULL; >>> >>> try >>> { >>> doc = parser->parseURI(gXmlFile); >>> } >>> catch (const OutOfMemoryException&) >>> { >>> XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << >>> >> XERCES_STD_QUALIFIER endl; >> >>> errorsOccured = true; >>> } >>> catch (const XMLException& e) >>> { >>> XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n >>> >> Message: " >> >>> << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl; >>> errorsOccured = true; >>> } >>> >>> catch (const DOMException& e) >>> { >>> const unsigned int maxChars = 2047; >>> XMLCh errText[maxChars + 1]; >>> >>> XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" << >>> >> gXmlFile << "'\n" >> >>> << "DOMException code is: " << e.code << >>> >> XERCES_STD_QUALIFIER endl; >> >>> if (DOMImplementation::loadDOMExceptionMsg(e.code, errText, >>> >> maxChars)) >> >>> XERCES_STD_QUALIFIER cerr << "Message is: " << StrX(errText) >>> >> << XERCES_STD_QUALIFIER endl; >> >>> errorsOccured = true; >>> } >>> >>> catch (...) >>> { >>> XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n >>> >> " << XERCES_STD_QUALIFIER endl; >> >>> errorsOccured = true; >>> } >>> >>> >>> >>> >>> >>> > > |
|
|
Re: RE: method startElement() from class DOMLSParserFilterHi John,
as far as i understand the explanation for the method startElement() in the API reference there are no childrens. "The element node passed to startElement for filtering will include all of the attributes, but none of the children nodes." As a consequence removing of children must be done by the parser internally. Is this correct? Best regards Mirko -------- Original-Nachricht -------- > Datum: Fri, 4 Sep 2009 08:11:14 -0400 > Von: John Lilley <jlilley@...> > An: "c-users@..." <c-users@...> > Betreff: RE: method startElement() from class DOMLSParserFilter > Forgive my ignorance, but could it be that you must reject not only the > node you don't want, but all of its children as well? > > john > > -----Original Message----- > From: Mirko Braun [mailto:mirko.braun@...] > Sent: Friday, September 04, 2009 6:01 AM > To: c-users@... > Subject: Re: method startElement() from class DOMLSParserFilter > > > Hi Alberto, > > thank you for you answer. I integrated the changes you > suggested, but the result is still the same: > > DOM Error during parsing: > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > DOMException code is: 3 > Message is: attempt is made to insert a node where it is not permitted > > Best regards, > Mirko > > -------- Original-Nachricht -------- > > Datum: Fri, 04 Sep 2009 12:37:10 +0200 > > Von: Alberto Massari <amassari@...> > > An: c-users@... > > Betreff: Re: method startElement() from class DOMLSParserFilter > > > Hi Mirko, > > I think the current implementation of the DOMLSParserFilter doesn't work > > nicely with your code, as the rejected nodes are not recycled and the > > memory will grow to the same level as before. > > Anyhow, you should instead override acceptNode like this: > > > > DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement* > > node) > > { > > // for element whose name is "DATA", skip it > > if (node->getNodeType()==DOMNode::ELEMENT_NODE && > > XMLString::compareString(node->getNodeName(), element_data)==0) > > return DOMParserFilter::FILTER_REJECT; > > else > > return DOMParserFilter::FILTER_ACCEPT; > > } > > > > Then, change DOMLSParserImpl::endElement to add a call to > > origNode->release() after the call to removeChild(). > > > > Alberto > > > > > > Mirko Braun wrote: > > > Hello everybody, > > > > > > i would like to parse a quite large XML file (about 180 MB). > > > I used the DOM interface because i need the tree for further > > > processing of the data the xml file contains. Of course there > > > is a lot of memory used during parsing the file and i got an > > > "Out of memory" exception. > > > > > > I noticed that a class DOMLSParserFilter comes along wiht Xercesc C++ > > 3.0.1 (Win32), which makes it possible to filter the Nodes during > parsing. > > > That is perfect for me because one XML-Element in my large file > > > contains most of the data. This XML-Element is called DATA and > > > appears serveral time in my XML file. > > > So i had the idea to reject this XML-Element from the DOM tree > > > during parsing to reduce the used memory by using the method > > > startElement() of the DOMLSParserFilter class. After that i would > > > use a SAX parser and just get all XML-Elements DATA with their values. > > > But it does not work. > > > I integregated my code into the DOMPrint example which comes along > > > with Xercesc C++ 3.0.1. The following error message occurred: > > > > > > DOM Error during parsing: > > > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > > > DOMException code is: 3 > > > Message is: attempt is made to insert a node where it is not permitted > > > > > > > > > Did i misunderstand the functionality of the DOMLSParserFilter class > > > and its method startElement? > > > It is possible to realize my idea with the help of this class? Did > > > i something wrong with in my code (please have a look below)? > > > > > > I would be very grateful for any help. > > > > > > Thanks in advanced, > > > Mirko > > > > > > > > > DOMPrintFilter.hpp: > > > -------------------- > > > > > > > > > class DOMParserFilter : public DOMLSParserFilter { > > > public: > > > > > > DOMParserFilter(DOMNodeFilter::ShowType whatToShow = > > DOMNodeFilter::SHOW_ALL); > > > ~DOMParserFilter(){}; > > > > > > virtual FilterAction startElement(DOMElement* node); > > > virtual FilterAction acceptNode(DOMNode* node){return > > DOMParserFilter::FILTER_ACCEPT;}; > > > virtual DOMNodeFilter::ShowType getWhatToShow() const {return > > fWhatToShow;}; > > > > > > private: > > > DOMNodeFilter::ShowType fWhatToShow; > > > }; > > > > > > > > > DOMPrintFilter.cpp: > > > -------------------- > > > > > > DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow) > > > :fWhatToShow(whatToShow) > > > {} > > > > > > DOMParserFilter::FilterAction > DOMParserFilter::startElement(DOMElement* > > node) > > > { > > > // for element whose name is "DATA", skip it > > > if (XMLString::compareString(node->getNodeName(), element_data)==0) > > > return DOMParserFilter::FILTER_REJECT; > > > else > > > return DOMParserFilter::FILTER_ACCEPT; > > > } > > > > > > > > > DOMPrint.cpp: > > > --------------- > > > > > > static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S, > > xercesc::chNull }; > > > > > > xercesc::DOMImplementation *implParser = > > xercesc::DOMImplementationRegistry::getDOMImplementation(gLS); > > > > > > xercesc::DOMLSParser* parser = > > > ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0); > > > > > > > > > > > > DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter(); > > > > parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler, > > errReporter); > > > > > > DOMParserFilter * pDOMParserFilter = new DOMParserFilter(); > > > parser->setFilter(pDOMParserFilter); > > > > > > > > > // > > > // Parse the XML file, catching any XML exceptions that might > > propogate > > > // out of it. > > > // > > > bool errorsOccured = false; > > > DOMDocument *doc = NULL; > > > > > > try > > > { > > > doc = parser->parseURI(gXmlFile); > > > } > > > catch (const OutOfMemoryException&) > > > { > > > XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << > > XERCES_STD_QUALIFIER endl; > > > errorsOccured = true; > > > } > > > catch (const XMLException& e) > > > { > > > XERCES_STD_QUALIFIER cerr << "An error occurred during > parsing\n > > Message: " > > > << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl; > > > errorsOccured = true; > > > } > > > > > > catch (const DOMException& e) > > > { > > > const unsigned int maxChars = 2047; > > > XMLCh errText[maxChars + 1]; > > > > > > XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" << > > gXmlFile << "'\n" > > > << "DOMException code is: " << e.code << > > XERCES_STD_QUALIFIER endl; > > > > > > if (DOMImplementation::loadDOMExceptionMsg(e.code, errText, > > maxChars)) > > > XERCES_STD_QUALIFIER cerr << "Message is: " << > StrX(errText) > > << XERCES_STD_QUALIFIER endl; > > > > > > errorsOccured = true; > > > } > > > > > > catch (...) > > > { > > > XERCES_STD_QUALIFIER cerr << "An error occurred during > parsing\n > > " << XERCES_STD_QUALIFIER endl; > > > errorsOccured = true; > > > } > > > > > > > > > > > > > > > |
|
|
Re: method startElement() from class DOMLSParserFilterHi Alberto,
yes i'm sure that DATA is not a root node. I debugged a little bit. The exception occurs after the sixth time this DATA node was found. Mirko -------- Original-Nachricht -------- > Datum: Fri, 04 Sep 2009 14:21:15 +0200 > Von: Alberto Massari <amassari@...> > An: c-users@... > Betreff: Re: method startElement() from class DOMLSParserFilter > Hi Mirko, > are you sure that your root node isn't one of those DATA elements? In > this case the document node would see more than one root element. > > Alberto > > Mirko Braun wrote: > > Hi Alberto, > > > > thank you for you answer. I integrated the changes you > > suggested, but the result is still the same: > > > > DOM Error during parsing: > > > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > > DOMException code is: 3 > > Message is: attempt is made to insert a node where it is not permitted > > > > Best regards, > > Mirko > > > > -------- Original-Nachricht -------- > > > >> Datum: Fri, 04 Sep 2009 12:37:10 +0200 > >> Von: Alberto Massari <amassari@...> > >> An: c-users@... > >> Betreff: Re: method startElement() from class DOMLSParserFilter > >> > > > > > >> Hi Mirko, > >> I think the current implementation of the DOMLSParserFilter doesn't > work > >> nicely with your code, as the rejected nodes are not recycled and the > >> memory will grow to the same level as before. > >> Anyhow, you should instead override acceptNode like this: > >> > >> DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement* > >> node) > >> { > >> // for element whose name is "DATA", skip it > >> if (node->getNodeType()==DOMNode::ELEMENT_NODE && > >> XMLString::compareString(node->getNodeName(), element_data)==0) > >> return DOMParserFilter::FILTER_REJECT; > >> else > >> return DOMParserFilter::FILTER_ACCEPT; > >> } > >> > >> Then, change DOMLSParserImpl::endElement to add a call to > >> origNode->release() after the call to removeChild(). > >> > >> Alberto > >> > >> > >> Mirko Braun wrote: > >> > >>> Hello everybody, > >>> > >>> i would like to parse a quite large XML file (about 180 MB). > >>> I used the DOM interface because i need the tree for further > >>> processing of the data the xml file contains. Of course there > >>> is a lot of memory used during parsing the file and i got an > >>> "Out of memory" exception. > >>> > >>> I noticed that a class DOMLSParserFilter comes along wiht Xercesc C++ > >>> > >> 3.0.1 (Win32), which makes it possible to filter the Nodes during > parsing. > >> > >>> That is perfect for me because one XML-Element in my large file > >>> contains most of the data. This XML-Element is called DATA and > >>> appears serveral time in my XML file. > >>> So i had the idea to reject this XML-Element from the DOM tree > >>> during parsing to reduce the used memory by using the method > >>> startElement() of the DOMLSParserFilter class. After that i would > >>> use a SAX parser and just get all XML-Elements DATA with their values. > >>> But it does not work. > >>> I integregated my code into the DOMPrint example which comes along > >>> with Xercesc C++ 3.0.1. The following error message occurred: > >>> > >>> DOM Error during parsing: > >>> > >> > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > >> > >>> DOMException code is: 3 > >>> Message is: attempt is made to insert a node where it is not permitted > >>> > >>> > >>> Did i misunderstand the functionality of the DOMLSParserFilter class > >>> and its method startElement? > >>> It is possible to realize my idea with the help of this class? Did > >>> i something wrong with in my code (please have a look below)? > >>> > >>> I would be very grateful for any help. > >>> > >>> Thanks in advanced, > >>> Mirko > >>> > >>> > >>> DOMPrintFilter.hpp: > >>> -------------------- > >>> > >>> > >>> class DOMParserFilter : public DOMLSParserFilter { > >>> public: > >>> > >>> DOMParserFilter(DOMNodeFilter::ShowType whatToShow = > >>> > >> DOMNodeFilter::SHOW_ALL); > >> > >>> ~DOMParserFilter(){}; > >>> > >>> virtual FilterAction startElement(DOMElement* node); > >>> virtual FilterAction acceptNode(DOMNode* node){return > >>> > >> DOMParserFilter::FILTER_ACCEPT;}; > >> > >>> virtual DOMNodeFilter::ShowType getWhatToShow() const {return > >>> > >> fWhatToShow;}; > >> > >>> private: > >>> DOMNodeFilter::ShowType fWhatToShow; > >>> }; > >>> > >>> > >>> DOMPrintFilter.cpp: > >>> -------------------- > >>> > >>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow) > >>> :fWhatToShow(whatToShow) > >>> {} > >>> > >>> DOMParserFilter::FilterAction > DOMParserFilter::startElement(DOMElement* > >>> > >> node) > >> > >>> { > >>> // for element whose name is "DATA", skip it > >>> if (XMLString::compareString(node->getNodeName(), element_data)==0) > >>> return DOMParserFilter::FILTER_REJECT; > >>> else > >>> return DOMParserFilter::FILTER_ACCEPT; > >>> } > >>> > >>> > >>> DOMPrint.cpp: > >>> --------------- > >>> > >>> static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S, > >>> > >> xercesc::chNull }; > >> > >>> xercesc::DOMImplementation *implParser = > >>> > >> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS); > >> > >>> xercesc::DOMLSParser* parser = > >>> > >> > ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0); > >> > >>> > >>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter(); > >>> > parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler, > >>> > >> errReporter); > >> > >>> > >>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter(); > >>> parser->setFilter(pDOMParserFilter); > >>> > >>> > >>> // > >>> // Parse the XML file, catching any XML exceptions that might > >>> > >> propogate > >> > >>> // out of it. > >>> // > >>> bool errorsOccured = false; > >>> DOMDocument *doc = NULL; > >>> > >>> try > >>> { > >>> doc = parser->parseURI(gXmlFile); > >>> } > >>> catch (const OutOfMemoryException&) > >>> { > >>> XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << > >>> > >> XERCES_STD_QUALIFIER endl; > >> > >>> errorsOccured = true; > >>> } > >>> catch (const XMLException& e) > >>> { > >>> XERCES_STD_QUALIFIER cerr << "An error occurred during > parsing\n > >>> > >> Message: " > >> > >>> << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl; > >>> errorsOccured = true; > >>> } > >>> > >>> catch (const DOMException& e) > >>> { > >>> const unsigned int maxChars = 2047; > >>> XMLCh errText[maxChars + 1]; > >>> > >>> XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" << > >>> > >> gXmlFile << "'\n" > >> > >>> << "DOMException code is: " << e.code << > >>> > >> XERCES_STD_QUALIFIER endl; > >> > >>> if (DOMImplementation::loadDOMExceptionMsg(e.code, errText, > >>> > >> maxChars)) > >> > >>> XERCES_STD_QUALIFIER cerr << "Message is: " << > StrX(errText) > >>> > >> << XERCES_STD_QUALIFIER endl; > >> > >>> errorsOccured = true; > >>> } > >>> > >>> catch (...) > >>> { > >>> XERCES_STD_QUALIFIER cerr << "An error occurred during > parsing\n > >>> > >> " << XERCES_STD_QUALIFIER endl; > >> > >>> errorsOccured = true; > >>> } > >>> > >>> > >>> > >>> > >>> > >>> > > > > |
|
|
Re: method startElement() from class DOMLSParserFilterHi Mirko,
are you still using startElement()? That API would mess with the current parent, so it would break the parsing at a certain point. Alberto Mirko Braun wrote: > Hi Alberto, > > yes i'm sure that DATA is not a root node. I debugged a little bit. > The exception occurs after the sixth time this DATA node was found. > > Mirko > > -------- Original-Nachricht -------- > >> Datum: Fri, 04 Sep 2009 14:21:15 +0200 >> Von: Alberto Massari <amassari@...> >> An: c-users@... >> Betreff: Re: method startElement() from class DOMLSParserFilter >> > > >> Hi Mirko, >> are you sure that your root node isn't one of those DATA elements? In >> this case the document node would see more than one root element. >> >> Alberto >> >> Mirko Braun wrote: >> >>> Hi Alberto, >>> >>> thank you for you answer. I integrated the changes you >>> suggested, but the result is still the same: >>> >>> DOM Error during parsing: >>> >>> >> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' >> >>> DOMException code is: 3 >>> Message is: attempt is made to insert a node where it is not permitted >>> >>> Best regards, >>> Mirko >>> >>> -------- Original-Nachricht -------- >>> >>> >>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200 >>>> Von: Alberto Massari <amassari@...> >>>> An: c-users@... >>>> Betreff: Re: method startElement() from class DOMLSParserFilter >>>> >>>> >>> >>> >>>> Hi Mirko, >>>> I think the current implementation of the DOMLSParserFilter doesn't >>>> >> work >> >>>> nicely with your code, as the rejected nodes are not recycled and the >>>> memory will grow to the same level as before. >>>> Anyhow, you should instead override acceptNode like this: >>>> >>>> DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement* >>>> node) >>>> { >>>> // for element whose name is "DATA", skip it >>>> if (node->getNodeType()==DOMNode::ELEMENT_NODE && >>>> XMLString::compareString(node->getNodeName(), element_data)==0) >>>> return DOMParserFilter::FILTER_REJECT; >>>> else >>>> return DOMParserFilter::FILTER_ACCEPT; >>>> } >>>> >>>> Then, change DOMLSParserImpl::endElement to add a call to >>>> origNode->release() after the call to removeChild(). >>>> >>>> Alberto >>>> >>>> >>>> Mirko Braun wrote: >>>> >>>> >>>>> Hello everybody, >>>>> >>>>> i would like to parse a quite large XML file (about 180 MB). >>>>> I used the DOM interface because i need the tree for further >>>>> processing of the data the xml file contains. Of course there >>>>> is a lot of memory used during parsing the file and i got an >>>>> "Out of memory" exception. >>>>> >>>>> I noticed that a class DOMLSParserFilter comes along wiht Xercesc C++ >>>>> >>>>> >>>> 3.0.1 (Win32), which makes it possible to filter the Nodes during >>>> >> parsing. >> >>>> >>>> >>>>> That is perfect for me because one XML-Element in my large file >>>>> contains most of the data. This XML-Element is called DATA and >>>>> appears serveral time in my XML file. >>>>> So i had the idea to reject this XML-Element from the DOM tree >>>>> during parsing to reduce the used memory by using the method >>>>> startElement() of the DOMLSParserFilter class. After that i would >>>>> use a SAX parser and just get all XML-Elements DATA with their values. >>>>> But it does not work. >>>>> I integregated my code into the DOMPrint example which comes along >>>>> with Xercesc C++ 3.0.1. The following error message occurred: >>>>> >>>>> DOM Error during parsing: >>>>> >>>>> >> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' >> >>>> >>>> >>>>> DOMException code is: 3 >>>>> Message is: attempt is made to insert a node where it is not permitted >>>>> >>>>> >>>>> Did i misunderstand the functionality of the DOMLSParserFilter class >>>>> and its method startElement? >>>>> It is possible to realize my idea with the help of this class? Did >>>>> i something wrong with in my code (please have a look below)? >>>>> >>>>> I would be very grateful for any help. >>>>> >>>>> Thanks in advanced, >>>>> Mirko >>>>> >>>>> >>>>> DOMPrintFilter.hpp: >>>>> -------------------- >>>>> >>>>> >>>>> class DOMParserFilter : public DOMLSParserFilter { >>>>> public: >>>>> >>>>> DOMParserFilter(DOMNodeFilter::ShowType whatToShow = >>>>> >>>>> >>>> DOMNodeFilter::SHOW_ALL); >>>> >>>> >>>>> ~DOMParserFilter(){}; >>>>> >>>>> virtual FilterAction startElement(DOMElement* node); >>>>> virtual FilterAction acceptNode(DOMNode* node){return >>>>> >>>>> >>>> DOMParserFilter::FILTER_ACCEPT;}; >>>> >>>> >>>>> virtual DOMNodeFilter::ShowType getWhatToShow() const {return >>>>> >>>>> >>>> fWhatToShow;}; >>>> >>>> >>>>> private: >>>>> DOMNodeFilter::ShowType fWhatToShow; >>>>> }; >>>>> >>>>> >>>>> DOMPrintFilter.cpp: >>>>> -------------------- >>>>> >>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow) >>>>> :fWhatToShow(whatToShow) >>>>> {} >>>>> >>>>> DOMParserFilter::FilterAction >>>>> >> DOMParserFilter::startElement(DOMElement* >> >>>>> >>>>> >>>> node) >>>> >>>> >>>>> { >>>>> // for element whose name is "DATA", skip it >>>>> if (XMLString::compareString(node->getNodeName(), element_data)==0) >>>>> return DOMParserFilter::FILTER_REJECT; >>>>> else >>>>> return DOMParserFilter::FILTER_ACCEPT; >>>>> } >>>>> >>>>> >>>>> DOMPrint.cpp: >>>>> --------------- >>>>> >>>>> static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S, >>>>> >>>>> >>>> xercesc::chNull }; >>>> >>>> >>>>> xercesc::DOMImplementation *implParser = >>>>> >>>>> >>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS); >>>> >>>> >>>>> xercesc::DOMLSParser* parser = >>>>> >>>>> >> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0); >> >>>> >>>> >>>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter(); >>>>> >>>>> >> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler, >> >>>>> >>>>> >>>> errReporter); >>>> >>>> >>>>> >>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter(); >>>>> parser->setFilter(pDOMParserFilter); >>>>> >>>>> >>>>> // >>>>> // Parse the XML file, catching any XML exceptions that might >>>>> >>>>> >>>> propogate >>>> >>>> >>>>> // out of it. >>>>> // >>>>> bool errorsOccured = false; >>>>> DOMDocument *doc = NULL; >>>>> >>>>> try >>>>> { >>>>> doc = parser->parseURI(gXmlFile); >>>>> } >>>>> catch (const OutOfMemoryException&) >>>>> { >>>>> XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << >>>>> >>>>> >>>> XERCES_STD_QUALIFIER endl; >>>> >>>> >>>>> errorsOccured = true; >>>>> } >>>>> catch (const XMLException& e) >>>>> { >>>>> XERCES_STD_QUALIFIER cerr << "An error occurred during >>>>> >> parsing\n >> >>>>> >>>>> >>>> Message: " >>>> >>>> >>>>> << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl; >>>>> errorsOccured = true; >>>>> } >>>>> >>>>> catch (const DOMException& e) >>>>> { >>>>> const unsigned int maxChars = 2047; >>>>> XMLCh errText[maxChars + 1]; >>>>> >>>>> XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" << >>>>> >>>>> >>>> gXmlFile << "'\n" >>>> >>>> >>>>> << "DOMException code is: " << e.code << >>>>> >>>>> >>>> XERCES_STD_QUALIFIER endl; >>>> >>>> >>>>> if (DOMImplementation::loadDOMExceptionMsg(e.code, errText, >>>>> >>>>> >>>> maxChars)) >>>> >>>> >>>>> XERCES_STD_QUALIFIER cerr << "Message is: " << >>>>> >> StrX(errText) >> >>>>> >>>>> >>>> << XERCES_STD_QUALIFIER endl; >>>> >>>> >>>>> errorsOccured = true; >>>>> } >>>>> >>>>> catch (...) >>>>> { >>>>> XERCES_STD_QUALIFIER cerr << "An error occurred during >>>>> >> parsing\n >> >>>>> >>>>> >>>> " << XERCES_STD_QUALIFIER endl; >>>> >>>> >>>>> errorsOccured = true; >>>>> } >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>> >>> > > |
|
|
RE: RE: method startElement() from class DOMLSParserFilterI'm afraid I don't know the answer.
john -----Original Message----- From: Mirko Braun [mailto:mirko.braun@...] Sent: Friday, September 04, 2009 7:18 AM To: c-users@... Subject: Re: RE: method startElement() from class DOMLSParserFilter Hi John, as far as i understand the explanation for the method startElement() in the API reference there are no childrens. "The element node passed to startElement for filtering will include all of the attributes, but none of the children nodes." As a consequence removing of children must be done by the parser internally. Is this correct? Best regards Mirko -------- Original-Nachricht -------- > Datum: Fri, 4 Sep 2009 08:11:14 -0400 > Von: John Lilley <jlilley@...> > An: "c-users@..." <c-users@...> > Betreff: RE: method startElement() from class DOMLSParserFilter > Forgive my ignorance, but could it be that you must reject not only the > node you don't want, but all of its children as well? > > john > > -----Original Message----- > From: Mirko Braun [mailto:mirko.braun@...] > Sent: Friday, September 04, 2009 6:01 AM > To: c-users@... > Subject: Re: method startElement() from class DOMLSParserFilter > > > Hi Alberto, > > thank you for you answer. I integrated the changes you > suggested, but the result is still the same: > > DOM Error during parsing: > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > DOMException code is: 3 > Message is: attempt is made to insert a node where it is not permitted > > Best regards, > Mirko > > -------- Original-Nachricht -------- > > Datum: Fri, 04 Sep 2009 12:37:10 +0200 > > Von: Alberto Massari <amassari@...> > > An: c-users@... > > Betreff: Re: method startElement() from class DOMLSParserFilter > > > Hi Mirko, > > I think the current implementation of the DOMLSParserFilter doesn't work > > nicely with your code, as the rejected nodes are not recycled and the > > memory will grow to the same level as before. > > Anyhow, you should instead override acceptNode like this: > > > > DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement* > > node) > > { > > // for element whose name is "DATA", skip it > > if (node->getNodeType()==DOMNode::ELEMENT_NODE && > > XMLString::compareString(node->getNodeName(), element_data)==0) > > return DOMParserFilter::FILTER_REJECT; > > else > > return DOMParserFilter::FILTER_ACCEPT; > > } > > > > Then, change DOMLSParserImpl::endElement to add a call to > > origNode->release() after the call to removeChild(). > > > > Alberto > > > > > > Mirko Braun wrote: > > > Hello everybody, > > > > > > i would like to parse a quite large XML file (about 180 MB). > > > I used the DOM interface because i need the tree for further > > > processing of the data the xml file contains. Of course there > > > is a lot of memory used during parsing the file and i got an > > > "Out of memory" exception. > > > > > > I noticed that a class DOMLSParserFilter comes along wiht Xercesc C++ > > 3.0.1 (Win32), which makes it possible to filter the Nodes during > parsing. > > > That is perfect for me because one XML-Element in my large file > > > contains most of the data. This XML-Element is called DATA and > > > appears serveral time in my XML file. > > > So i had the idea to reject this XML-Element from the DOM tree > > > during parsing to reduce the used memory by using the method > > > startElement() of the DOMLSParserFilter class. After that i would > > > use a SAX parser and just get all XML-Elements DATA with their values. > > > But it does not work. > > > I integregated my code into the DOMPrint example which comes along > > > with Xercesc C++ 3.0.1. The following error message occurred: > > > > > > DOM Error during parsing: > > > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > > > DOMException code is: 3 > > > Message is: attempt is made to insert a node where it is not permitted > > > > > > > > > Did i misunderstand the functionality of the DOMLSParserFilter class > > > and its method startElement? > > > It is possible to realize my idea with the help of this class? Did > > > i something wrong with in my code (please have a look below)? > > > > > > I would be very grateful for any help. > > > > > > Thanks in advanced, > > > Mirko > > > > > > > > > DOMPrintFilter.hpp: > > > -------------------- > > > > > > > > > class DOMParserFilter : public DOMLSParserFilter { > > > public: > > > > > > DOMParserFilter(DOMNodeFilter::ShowType whatToShow = > > DOMNodeFilter::SHOW_ALL); > > > ~DOMParserFilter(){}; > > > > > > virtual FilterAction startElement(DOMElement* node); > > > virtual FilterAction acceptNode(DOMNode* node){return > > DOMParserFilter::FILTER_ACCEPT;}; > > > virtual DOMNodeFilter::ShowType getWhatToShow() const {return > > fWhatToShow;}; > > > > > > private: > > > DOMNodeFilter::ShowType fWhatToShow; > > > }; > > > > > > > > > DOMPrintFilter.cpp: > > > -------------------- > > > > > > DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow) > > > :fWhatToShow(whatToShow) > > > {} > > > > > > DOMParserFilter::FilterAction > DOMParserFilter::startElement(DOMElement* > > node) > > > { > > > // for element whose name is "DATA", skip it > > > if (XMLString::compareString(node->getNodeName(), element_data)==0) > > > return DOMParserFilter::FILTER_REJECT; > > > else > > > return DOMParserFilter::FILTER_ACCEPT; > > > } > > > > > > > > > DOMPrint.cpp: > > > --------------- > > > > > > static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S, > > xercesc::chNull }; > > > > > > xercesc::DOMImplementation *implParser = > > xercesc::DOMImplementationRegistry::getDOMImplementation(gLS); > > > > > > xercesc::DOMLSParser* parser = > > > ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0); > > > > > > > > > > > > DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter(); > > > > parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler, > > errReporter); > > > > > > DOMParserFilter * pDOMParserFilter = new DOMParserFilter(); > > > parser->setFilter(pDOMParserFilter); > > > > > > > > > // > > > // Parse the XML file, catching any XML exceptions that might > > propogate > > > // out of it. > > > // > > > bool errorsOccured = false; > > > DOMDocument *doc = NULL; > > > > > > try > > > { > > > doc = parser->parseURI(gXmlFile); > > > } > > > catch (const OutOfMemoryException&) > > > { > > > XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << > > XERCES_STD_QUALIFIER endl; > > > errorsOccured = true; > > > } > > > catch (const XMLException& e) > > > { > > > XERCES_STD_QUALIFIER cerr << "An error occurred during > parsing\n > > Message: " > > > << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl; > > > errorsOccured = true; > > > } > > > > > > catch (const DOMException& e) > > > { > > > const unsigned int maxChars = 2047; > > > XMLCh errText[maxChars + 1]; > > > > > > XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" << > > gXmlFile << "'\n" > > > << "DOMException code is: " << e.code << > > XERCES_STD_QUALIFIER endl; > > > > > > if (DOMImplementation::loadDOMExceptionMsg(e.code, errText, > > maxChars)) > > > XERCES_STD_QUALIFIER cerr << "Message is: " << > StrX(errText) > > << XERCES_STD_QUALIFIER endl; > > > > > > errorsOccured = true; > > > } > > > > > > catch (...) > > > { > > > XERCES_STD_QUALIFIER cerr << "An error occurred during > parsing\n > > " << XERCES_STD_QUALIFIER endl; > > > errorsOccured = true; > > > } > > > > > > > > > > > > > > > |
|
|
Re: method startElement() from class DOMLSParserFilterHi Alberto, yes, i'm still using the method startElement(). Is it better to use the method acceptNode() to reject the DATA node from the DOM or is there any other possibility? Mirko -------- Original-Nachricht -------- > Datum: Fri, 04 Sep 2009 15:41:54 +0200 > Von: Alberto Massari <amassari@...> > An: c-users@... > Betreff: Re: method startElement() from class DOMLSParserFilter > Hi Mirko, > are you still using startElement()? That API would mess with the current > parent, so it would break the parsing at a certain point. > > Alberto > > Mirko Braun wrote: > > Hi Alberto, > > > > yes i'm sure that DATA is not a root node. I debugged a little bit. > > The exception occurs after the sixth time this DATA node was found. > > > > Mirko > > > > -------- Original-Nachricht -------- > > > >> Datum: Fri, 04 Sep 2009 14:21:15 +0200 > >> Von: Alberto Massari <amassari@...> > >> An: c-users@... > >> Betreff: Re: method startElement() from class DOMLSParserFilter > >> > > > > > >> Hi Mirko, > >> are you sure that your root node isn't one of those DATA elements? In > >> this case the document node would see more than one root element. > >> > >> Alberto > >> > >> Mirko Braun wrote: > >> > >>> Hi Alberto, > >>> > >>> thank you for you answer. I integrated the changes you > >>> suggested, but the result is still the same: > >>> > >>> DOM Error during parsing: > >>> > >>> > >> > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > >> > >>> DOMException code is: 3 > >>> Message is: attempt is made to insert a node where it is not permitted > >>> > >>> Best regards, > >>> Mirko > >>> > >>> -------- Original-Nachricht -------- > >>> > >>> > >>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200 > >>>> Von: Alberto Massari <amassari@...> > >>>> An: c-users@... > >>>> Betreff: Re: method startElement() from class DOMLSParserFilter > >>>> > >>>> > >>> > >>> > >>>> Hi Mirko, > >>>> I think the current implementation of the DOMLSParserFilter doesn't > >>>> > >> work > >> > >>>> nicely with your code, as the rejected nodes are not recycled and the > >>>> memory will grow to the same level as before. > >>>> Anyhow, you should instead override acceptNode like this: > >>>> > >>>> DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement* > >>>> node) > >>>> { > >>>> // for element whose name is "DATA", skip it > >>>> if (node->getNodeType()==DOMNode::ELEMENT_NODE && > >>>> XMLString::compareString(node->getNodeName(), element_data)==0) > >>>> return DOMParserFilter::FILTER_REJECT; > >>>> else > >>>> return DOMParserFilter::FILTER_ACCEPT; > >>>> } > >>>> > >>>> Then, change DOMLSParserImpl::endElement to add a call to > >>>> origNode->release() after the call to removeChild(). > >>>> > >>>> Alberto > >>>> > >>>> > >>>> Mirko Braun wrote: > >>>> > >>>> > >>>>> Hello everybody, > >>>>> > >>>>> i would like to parse a quite large XML file (about 180 MB). > >>>>> I used the DOM interface because i need the tree for further > >>>>> processing of the data the xml file contains. Of course there > >>>>> is a lot of memory used during parsing the file and i got an > >>>>> "Out of memory" exception. > >>>>> > >>>>> I noticed that a class DOMLSParserFilter comes along wiht Xercesc > C++ > >>>>> > >>>>> > >>>> 3.0.1 (Win32), which makes it possible to filter the Nodes during > >>>> > >> parsing. > >> > >>>> > >>>> > >>>>> That is perfect for me because one XML-Element in my large file > >>>>> contains most of the data. This XML-Element is called DATA and > >>>>> appears serveral time in my XML file. > >>>>> So i had the idea to reject this XML-Element from the DOM tree > >>>>> during parsing to reduce the used memory by using the method > >>>>> startElement() of the DOMLSParserFilter class. After that i would > >>>>> use a SAX parser and just get all XML-Elements DATA with their > values. > >>>>> But it does not work. > >>>>> I integregated my code into the DOMPrint example which comes along > >>>>> with Xercesc C++ 3.0.1. The following error message occurred: > >>>>> > >>>>> DOM Error during parsing: > >>>>> > >>>>> > >> > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > >> > >>>> > >>>> > >>>>> DOMException code is: 3 > >>>>> Message is: attempt is made to insert a node where it is not > permitted > >>>>> > >>>>> > >>>>> Did i misunderstand the functionality of the DOMLSParserFilter class > >>>>> and its method startElement? > >>>>> It is possible to realize my idea with the help of this class? Did > >>>>> i something wrong with in my code (please have a look below)? > >>>>> > >>>>> I would be very grateful for any help. > >>>>> > >>>>> Thanks in advanced, > >>>>> Mirko > >>>>> > >>>>> > >>>>> DOMPrintFilter.hpp: > >>>>> -------------------- > >>>>> > >>>>> > >>>>> class DOMParserFilter : public DOMLSParserFilter { > >>>>> public: > >>>>> > >>>>> DOMParserFilter(DOMNodeFilter::ShowType whatToShow = > >>>>> > >>>>> > >>>> DOMNodeFilter::SHOW_ALL); > >>>> > >>>> > >>>>> ~DOMParserFilter(){}; > >>>>> > >>>>> virtual FilterAction startElement(DOMElement* node); > >>>>> virtual FilterAction acceptNode(DOMNode* node){return > >>>>> > >>>>> > >>>> DOMParserFilter::FILTER_ACCEPT;}; > >>>> > >>>> > >>>>> virtual DOMNodeFilter::ShowType getWhatToShow() const {return > >>>>> > >>>>> > >>>> fWhatToShow;}; > >>>> > >>>> > >>>>> private: > >>>>> DOMNodeFilter::ShowType fWhatToShow; > >>>>> }; > >>>>> > >>>>> > >>>>> DOMPrintFilter.cpp: > >>>>> -------------------- > >>>>> > >>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow) > >>>>> :fWhatToShow(whatToShow) > >>>>> {} > >>>>> > >>>>> DOMParserFilter::FilterAction > >>>>> > >> DOMParserFilter::startElement(DOMElement* > >> > >>>>> > >>>>> > >>>> node) > >>>> > >>>> > >>>>> { > >>>>> // for element whose name is "DATA", skip it > >>>>> if (XMLString::compareString(node->getNodeName(), > element_data)==0) > >>>>> return DOMParserFilter::FILTER_REJECT; > >>>>> else > >>>>> return DOMParserFilter::FILTER_ACCEPT; > >>>>> } > >>>>> > >>>>> > >>>>> DOMPrint.cpp: > >>>>> --------------- > >>>>> > >>>>> static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S, > >>>>> > >>>>> > >>>> xercesc::chNull }; > >>>> > >>>> > >>>>> xercesc::DOMImplementation *implParser = > >>>>> > >>>>> > >>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS); > >>>> > >>>> > >>>>> xercesc::DOMLSParser* parser = > >>>>> > >>>>> > >> > ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0); > >> > >>>> > >>>> > >>>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter(); > >>>>> > >>>>> > >> > parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler, > >> > >>>>> > >>>>> > >>>> errReporter); > >>>> > >>>> > >>>>> > >>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter(); > >>>>> parser->setFilter(pDOMParserFilter); > >>>>> > >>>>> > >>>>> // > >>>>> // Parse the XML file, catching any XML exceptions that might > >>>>> > >>>>> > >>>> propogate > >>>> > >>>> > >>>>> // out of it. > >>>>> // > >>>>> bool errorsOccured = false; > >>>>> DOMDocument *doc = NULL; > >>>>> > >>>>> try > >>>>> { > >>>>> doc = parser->parseURI(gXmlFile); > >>>>> } > >>>>> catch (const OutOfMemoryException&) > >>>>> { > >>>>> XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << > >>>>> > >>>>> > >>>> XERCES_STD_QUALIFIER endl; > >>>> > >>>> > >>>>> errorsOccured = true; > >>>>> } > >>>>> catch (const XMLException& e) > >>>>> { > >>>>> XERCES_STD_QUALIFIER cerr << "An error occurred during > >>>>> > >> parsing\n > >> > >>>>> > >>>>> > >>>> Message: " > >>>> > >>>> > >>>>> << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl; > >>>>> errorsOccured = true; > >>>>> } > >>>>> > >>>>> catch (const DOMException& e) > >>>>> { > >>>>> const unsigned int maxChars = 2047; > >>>>> XMLCh errText[maxChars + 1]; > >>>>> > >>>>> XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" > << > >>>>> > >>>>> > >>>> gXmlFile << "'\n" > >>>> > >>>> > >>>>> << "DOMException code is: " << e.code << > >>>>> > >>>>> > >>>> XERCES_STD_QUALIFIER endl; > >>>> > >>>> > >>>>> if (DOMImplementation::loadDOMExceptionMsg(e.code, errText, > >>>>> > >>>>> > >>>> maxChars)) > >>>> > >>>> > >>>>> XERCES_STD_QUALIFIER cerr << "Message is: " << > >>>>> > >> StrX(errText) > >> > >>>>> > >>>>> > >>>> << XERCES_STD_QUALIFIER endl; > >>>> > >>>> > >>>>> errorsOccured = true; > >>>>> } > >>>>> > >>>>> catch (...) > >>>>> { > >>>>> XERCES_STD_QUALIFIER cerr << "An error occurred during > >>>>> > >> parsing\n > >> > >>>>> > >>>>> > >>>> " << XERCES_STD_QUALIFIER endl; > >>>> > >>>> > >>>>> errorsOccured = true; > >>>>> } > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>> > >>> > > > > |
|
|
Re: method startElement() from class DOMLSParserFilterIn effect I am seeing so many problems with that code that the only
suggestion I have is to get the latest 3.0 from the trunk and work with what I have just committed (or get the patch from http://svn.apache.org/viewvc?rev=811420&view=rev and apply to the 3.0.1 code). This version should support your original code. Alberto Mirko Braun wrote: > Hi Alberto, > > yes, i'm still using the method startElement(). Is it better > to use the method acceptNode() to reject the DATA node from > the DOM or is there any other possibility? > > Mirko > > > -------- Original-Nachricht -------- > >> Datum: Fri, 04 Sep 2009 15:41:54 +0200 >> Von: Alberto Massari <amassari@...> >> An: c-users@... >> Betreff: Re: method startElement() from class DOMLSParserFilter >> > > >> Hi Mirko, >> are you still using startElement()? That API would mess with the current >> parent, so it would break the parsing at a certain point. >> >> Alberto >> >> Mirko Braun wrote: >> >>> Hi Alberto, >>> >>> yes i'm sure that DATA is not a root node. I debugged a little bit. >>> The exception occurs after the sixth time this DATA node was found. >>> >>> Mirko >>> >>> -------- Original-Nachricht -------- >>> >>> >>>> Datum: Fri, 04 Sep 2009 14:21:15 +0200 >>>> Von: Alberto Massari <amassari@...> >>>> An: c-users@... >>>> Betreff: Re: method startElement() from class DOMLSParserFilter >>>> >>>> >>> >>> >>>> Hi Mirko, >>>> are you sure that your root node isn't one of those DATA elements? In >>>> this case the document node would see more than one root element. >>>> >>>> Alberto >>>> >>>> Mirko Braun wrote: >>>> >>>> >>>>> Hi Alberto, >>>>> >>>>> thank you for you answer. I integrated the changes you >>>>> suggested, but the result is still the same: >>>>> >>>>> DOM Error during parsing: >>>>> >>>>> >>>>> >> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' >> >>>> >>>> >>>>> DOMException code is: 3 >>>>> Message is: attempt is made to insert a node where it is not permitted >>>>> >>>>> Best regards, >>>>> Mirko >>>>> >>>>> -------- Original-Nachricht -------- >>>>> >>>>> >>>>> >>>>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200 >>>>>> Von: Alberto Massari <amassari@...> >>>>>> An: c-users@... >>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>>> Hi Mirko, >>>>>> I think the current implementation of the DOMLSParserFilter doesn't >>>>>> >>>>>> >>>> work >>>> >>>> >>>>>> nicely with your code, as the rejected nodes are not recycled and the >>>>>> memory will grow to the same level as before. >>>>>> Anyhow, you should instead override acceptNode like this: >>>>>> >>>>>> DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement* >>>>>> node) >>>>>> { >>>>>> // for element whose name is "DATA", skip it >>>>>> if (node->getNodeType()==DOMNode::ELEMENT_NODE && >>>>>> XMLString::compareString(node->getNodeName(), element_data)==0) >>>>>> return DOMParserFilter::FILTER_REJECT; >>>>>> else >>>>>> return DOMParserFilter::FILTER_ACCEPT; >>>>>> } >>>>>> >>>>>> Then, change DOMLSParserImpl::endElement to add a call to >>>>>> origNode->release() after the call to removeChild(). >>>>>> >>>>>> Alberto >>>>>> >>>>>> >>>>>> Mirko Braun wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Hello everybody, >>>>>>> >>>>>>> i would like to parse a quite large XML file (about 180 MB). >>>>>>> I used the DOM interface because i need the tree for further >>>>>>> processing of the data the xml file contains. Of course there >>>>>>> is a lot of memory used during parsing the file and i got an >>>>>>> "Out of memory" exception. >>>>>>> >>>>>>> I noticed that a class DOMLSParserFilter comes along wiht Xercesc >>>>>>> >> C++ >> >>>>>>> >>>>>>> >>>>>>> >>>>>> 3.0.1 (Win32), which makes it possible to filter the Nodes during >>>>>> >>>>>> >>>> parsing. >>>> >>>> >>>>>> >>>>>> >>>>>> >>>>>>> That is perfect for me because one XML-Element in my large file >>>>>>> contains most of the data. This XML-Element is called DATA and >>>>>>> appears serveral time in my XML file. >>>>>>> So i had the idea to reject this XML-Element from the DOM tree >>>>>>> during parsing to reduce the used memory by using the method >>>>>>> startElement() of the DOMLSParserFilter class. After that i would >>>>>>> use a SAX parser and just get all XML-Elements DATA with their >>>>>>> >> values. >> >>>>>>> But it does not work. >>>>>>> I integregated my code into the DOMPrint example which comes along >>>>>>> with Xercesc C++ 3.0.1. The following error message occurred: >>>>>>> >>>>>>> DOM Error during parsing: >>>>>>> >>>>>>> >>>>>>> >> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' >> >>>> >>>> >>>>>> >>>>>> >>>>>> >>>>>>> DOMException code is: 3 >>>>>>> Message is: attempt is made to insert a node where it is not >>>>>>> >> permitted >> >>>>>>> Did i misunderstand the functionality of the DOMLSParserFilter class >>>>>>> and its method startElement? >>>>>>> It is possible to realize my idea with the help of this class? Did >>>>>>> i something wrong with in my code (please have a look below)? >>>>>>> >>>>>>> I would be very grateful for any help. >>>>>>> >>>>>>> Thanks in advanced, >>>>>>> Mirko >>>>>>> >>>>>>> >>>>>>> DOMPrintFilter.hpp: >>>>>>> -------------------- >>>>>>> >>>>>>> >>>>>>> class DOMParserFilter : public DOMLSParserFilter { >>>>>>> public: >>>>>>> >>>>>>> DOMParserFilter(DOMNodeFilter::ShowType whatToShow = >>>>>>> >>>>>>> >>>>>>> >>>>>> DOMNodeFilter::SHOW_ALL); >>>>>> >>>>>> >>>>>> >>>>>>> ~DOMParserFilter(){}; >>>>>>> >>>>>>> virtual FilterAction startElement(DOMElement* node); >>>>>>> virtual FilterAction acceptNode(DOMNode* node){return >>>>>>> >>>>>>> >>>>>>> >>>>>> DOMParserFilter::FILTER_ACCEPT;}; >>>>>> >>>>>> >>>>>> >>>>>>> virtual DOMNodeFilter::ShowType getWhatToShow() const {return >>>>>>> >>>>>>> >>>>>>> >>>>>> fWhatToShow;}; >>>>>> >>>>>> >>>>>> >>>>>>> private: >>>>>>> DOMNodeFilter::ShowType fWhatToShow; >>>>>>> }; >>>>>>> >>>>>>> >>>>>>> DOMPrintFilter.cpp: >>>>>>> -------------------- >>>>>>> >>>>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow) >>>>>>> :fWhatToShow(whatToShow) >>>>>>> {} >>>>>>> >>>>>>> DOMParserFilter::FilterAction >>>>>>> >>>>>>> >>>> DOMParserFilter::startElement(DOMElement* >>>> >>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> node) >>>>>> >>>>>> >>>>>> >>>>>>> { >>>>>>> // for element whose name is "DATA", skip it >>>>>>> if (XMLString::compareString(node->getNodeName(), >>>>>>> >> element_data)==0) >> >>>>>>> return DOMParserFilter::FILTER_REJECT; >>>>>>> else >>>>>>> return DOMParserFilter::FILTER_ACCEPT; >>>>>>> } >>>>>>> >>>>>>> >>>>>>> DOMPrint.cpp: >>>>>>> --------------- >>>>>>> >>>>>>> static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S, >>>>>>> >>>>>>> >>>>>>> >>>>>> xercesc::chNull }; >>>>>> >>>>>> >>>>>> >>>>>>> xercesc::DOMImplementation *implParser = >>>>>>> >>>>>>> >>>>>>> >>>>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS); >>>>>> >>>>>> >>>>>> >>>>>>> xercesc::DOMLSParser* parser = >>>>>>> >>>>>>> >>>>>>> >> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0); >> >>>> >>>> >>>>>> >>>>>> >>>>>> >>>>>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter(); >>>>>>> >>>>>>> >>>>>>> >> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler, >> >>>> >>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> errReporter); >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter(); >>>>>>> parser->setFilter(pDOMParserFilter); >>>>>>> >>>>>>> >>>>>>> // >>>>>>> // Parse the XML file, catching any XML exceptions that might >>>>>>> >>>>>>> >>>>>>> >>>>>> propogate >>>>>> >>>>>> >>>>>> >>>>>>> // out of it. >>>>>>> // >>>>>>> bool errorsOccured = false; >>>>>>> DOMDocument *doc = NULL; >>>>>>> >>>>>>> try >>>>>>> { >>>>>>> doc = parser->parseURI(gXmlFile); >>>>>>> } >>>>>>> catch (const OutOfMemoryException&) >>>>>>> { >>>>>>> XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << >>>>>>> >>>>>>> >>>>>>> >>>>>> XERCES_STD_QUALIFIER endl; >>>>>> >>>>>> >>>>>> >>>>>>> errorsOccured = true; >>>>>>> } >>>>>>> catch (const XMLException& e) >>>>>>> { >>>>>>> XERCES_STD_QUALIFIER cerr << "An error occurred during >>>>>>> >>>>>>> >>>> parsing\n >>>> >>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> Message: " >>>>>> >>>>>> >>>>>> >>>>>>> << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl; >>>>>>> errorsOccured = true; >>>>>>> } >>>>>>> >>>>>>> catch (const DOMException& e) >>>>>>> { >>>>>>> const unsigned int maxChars = 2047; >>>>>>> XMLCh errText[maxChars + 1]; >>>>>>> >>>>>>> XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" >>>>>>> >> << >> >>>>>>> >>>>>>> >>>>>>> >>>>>> gXmlFile << "'\n" >>>>>> >>>>>> >>>>>> >>>>>>> << "DOMException code is: " << e.code << >>>>>>> >>>>>>> >>>>>>> >>>>>> XERCES_STD_QUALIFIER endl; >>>>>> >>>>>> >>>>>> >>>>>>> if (DOMImplementation::loadDOMExceptionMsg(e.code, errText, >>>>>>> >>>>>>> >>>>>>> >>>>>> maxChars)) >>>>>> >>>>>> >>>>>> >>>>>>> XERCES_STD_QUALIFIER cerr << "Message is: " << >>>>>>> >>>>>>> >>>> StrX(errText) >>>> >>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> << XERCES_STD_QUALIFIER endl; >>>>>> >>>>>> >>>>>> >>>>>>> errorsOccured = true; >>>>>>> } >>>>>>> >>>>>>> catch (...) >>>>>>> { >>>>>>> XERCES_STD_QUALIFIER cerr << "An error occurred during >>>>>>> >>>>>>> >>>> parsing\n >>>> >>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> " << XERCES_STD_QUALIFIER endl; >>>>>> >>>>>> >>>>>> >>>>>>> errorsOccured = true; >>>>>>> } >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>>> >>> >>> > > |
|
|
Re: method startElement() from class DOMLSParserFilterHi Alberto,
thank you very much for your help. I integrated the patch in 3.0.1 and it worked. There is no exception any more. But there is still one problem. The usage of memory is still of the same size. I think if a node is rejected from the tree the usage of memory should also decrease. Is my conclusion correct? Mirko -------- Original-Nachricht -------- > Datum: Fri, 04 Sep 2009 16:12:16 +0200 > Von: Alberto Massari <amassari@...> > An: c-users@... > Betreff: Re: method startElement() from class DOMLSParserFilter > In effect I am seeing so many problems with that code that the only > suggestion I have is to get the latest 3.0 from the trunk and work with > what I have just committed (or get the patch from > http://svn.apache.org/viewvc?rev=811420&view=rev and apply to the 3.0.1 > code). This version should support your original code. > > Alberto > > > Mirko Braun wrote: > > Hi Alberto, > > > > yes, i'm still using the method startElement(). Is it better > > to use the method acceptNode() to reject the DATA node from > > the DOM or is there any other possibility? > > > > Mirko > > > > > > -------- Original-Nachricht -------- > > > >> Datum: Fri, 04 Sep 2009 15:41:54 +0200 > >> Von: Alberto Massari <amassari@...> > >> An: c-users@... > >> Betreff: Re: method startElement() from class DOMLSParserFilter > >> > > > > > >> Hi Mirko, > >> are you still using startElement()? That API would mess with the > current > >> parent, so it would break the parsing at a certain point. > >> > >> Alberto > >> > >> Mirko Braun wrote: > >> > >>> Hi Alberto, > >>> > >>> yes i'm sure that DATA is not a root node. I debugged a little bit. > >>> The exception occurs after the sixth time this DATA node was found. > >>> > >>> Mirko > >>> > >>> -------- Original-Nachricht -------- > >>> > >>> > >>>> Datum: Fri, 04 Sep 2009 14:21:15 +0200 > >>>> Von: Alberto Massari <amassari@...> > >>>> An: c-users@... > >>>> Betreff: Re: method startElement() from class DOMLSParserFilter > >>>> > >>>> > >>> > >>> > >>>> Hi Mirko, > >>>> are you sure that your root node isn't one of those DATA elements? In > >>>> this case the document node would see more than one root element. > >>>> > >>>> Alberto > >>>> > >>>> Mirko Braun wrote: > >>>> > >>>> > >>>>> Hi Alberto, > >>>>> > >>>>> thank you for you answer. I integrated the changes you > >>>>> suggested, but the result is still the same: > >>>>> > >>>>> DOM Error during parsing: > >>>>> > >>>>> > >>>>> > >> > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > >> > >>>> > >>>> > >>>>> DOMException code is: 3 > >>>>> Message is: attempt is made to insert a node where it is not > permitted > >>>>> > >>>>> Best regards, > >>>>> Mirko > >>>>> > >>>>> -------- Original-Nachricht -------- > >>>>> > >>>>> > >>>>> > >>>>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200 > >>>>>> Von: Alberto Massari <amassari@...> > >>>>>> An: c-users@... > >>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>>> Hi Mirko, > >>>>>> I think the current implementation of the DOMLSParserFilter doesn't > >>>>>> > >>>>>> > >>>> work > >>>> > >>>> > >>>>>> nicely with your code, as the rejected nodes are not recycled and > the > >>>>>> memory will grow to the same level as before. > >>>>>> Anyhow, you should instead override acceptNode like this: > >>>>>> > >>>>>> DOMParserFilter::FilterAction > DOMParserFilter::acceptNode(DOMElement* > >>>>>> node) > >>>>>> { > >>>>>> // for element whose name is "DATA", skip it > >>>>>> if (node->getNodeType()==DOMNode::ELEMENT_NODE && > >>>>>> XMLString::compareString(node->getNodeName(), element_data)==0) > >>>>>> return DOMParserFilter::FILTER_REJECT; > >>>>>> else > >>>>>> return DOMParserFilter::FILTER_ACCEPT; > >>>>>> } > >>>>>> > >>>>>> Then, change DOMLSParserImpl::endElement to add a call to > >>>>>> origNode->release() after the call to removeChild(). > >>>>>> > >>>>>> Alberto > >>>>>> > >>>>>> > >>>>>> Mirko Braun wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>> Hello everybody, > >>>>>>> > >>>>>>> i would like to parse a quite large XML file (about 180 MB). > >>>>>>> I used the DOM interface because i need the tree for further > >>>>>>> processing of the data the xml file contains. Of course there > >>>>>>> is a lot of memory used during parsing the file and i got an > >>>>>>> "Out of memory" exception. > >>>>>>> > >>>>>>> I noticed that a class DOMLSParserFilter comes along wiht Xercesc > >>>>>>> > >> C++ > >> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> 3.0.1 (Win32), which makes it possible to filter the Nodes during > >>>>>> > >>>>>> > >>>> parsing. > >>>> > >>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> That is perfect for me because one XML-Element in my large file > >>>>>>> contains most of the data. This XML-Element is called DATA and > >>>>>>> appears serveral time in my XML file. > >>>>>>> So i had the idea to reject this XML-Element from the DOM tree > >>>>>>> during parsing to reduce the used memory by using the method > >>>>>>> startElement() of the DOMLSParserFilter class. After that i would > >>>>>>> use a SAX parser and just get all XML-Elements DATA with their > >>>>>>> > >> values. > >> > >>>>>>> But it does not work. > >>>>>>> I integregated my code into the DOMPrint example which comes along > >>>>>>> with Xercesc C++ 3.0.1. The following error message occurred: > >>>>>>> > >>>>>>> DOM Error during parsing: > >>>>>>> > >>>>>>> > >>>>>>> > >> > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > >> > >>>> > >>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> DOMException code is: 3 > >>>>>>> Message is: attempt is made to insert a node where it is not > >>>>>>> > >> permitted > >> > >>>>>>> Did i misunderstand the functionality of the DOMLSParserFilter > class > >>>>>>> and its method startElement? > >>>>>>> It is possible to realize my idea with the help of this class? Did > >>>>>>> i something wrong with in my code (please have a look below)? > >>>>>>> > >>>>>>> I would be very grateful for any help. > >>>>>>> > >>>>>>> Thanks in advanced, > >>>>>>> Mirko > >>>>>>> > >>>>>>> > >>>>>>> DOMPrintFilter.hpp: > >>>>>>> -------------------- > >>>>>>> > >>>>>>> > >>>>>>> class DOMParserFilter : public DOMLSParserFilter { > >>>>>>> public: > >>>>>>> > >>>>>>> DOMParserFilter(DOMNodeFilter::ShowType whatToShow = > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> DOMNodeFilter::SHOW_ALL); > >>>>>> > >>>>>> > >>>>>> > >>>>>>> ~DOMParserFilter(){}; > >>>>>>> > >>>>>>> virtual FilterAction startElement(DOMElement* node); > >>>>>>> virtual FilterAction acceptNode(DOMNode* node){return > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> DOMParserFilter::FILTER_ACCEPT;}; > >>>>>> > >>>>>> > >>>>>> > >>>>>>> virtual DOMNodeFilter::ShowType getWhatToShow() const {return > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> fWhatToShow;}; > >>>>>> > >>>>>> > >>>>>> > >>>>>>> private: > >>>>>>> DOMNodeFilter::ShowType fWhatToShow; > >>>>>>> }; > >>>>>>> > >>>>>>> > >>>>>>> DOMPrintFilter.cpp: > >>>>>>> -------------------- > >>>>>>> > >>>>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType > whatToShow) > >>>>>>> :fWhatToShow(whatToShow) > >>>>>>> {} > >>>>>>> > >>>>>>> DOMParserFilter::FilterAction > >>>>>>> > >>>>>>> > >>>> DOMParserFilter::startElement(DOMElement* > >>>> > >>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> node) > >>>>>> > >>>>>> > >>>>>> > >>>>>>> { > >>>>>>> // for element whose name is "DATA", skip it > >>>>>>> if (XMLString::compareString(node->getNodeName(), > >>>>>>> > >> element_data)==0) > >> > >>>>>>> return DOMParserFilter::FILTER_REJECT; > >>>>>>> else > >>>>>>> return DOMParserFilter::FILTER_ACCEPT; > >>>>>>> } > >>>>>>> > >>>>>>> > >>>>>>> DOMPrint.cpp: > >>>>>>> --------------- > >>>>>>> > >>>>>>> static const XMLCh gLS[] = { xercesc::chLatin_L, > xercesc::chLatin_S, > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> xercesc::chNull }; > >>>>>> > >>>>>> > >>>>>> > >>>>>>> xercesc::DOMImplementation *implParser = > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS); > >>>>>> > >>>>>> > >>>>>> > >>>>>>> xercesc::DOMLSParser* parser = > >>>>>>> > >>>>>>> > >>>>>>> > >> > ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0); > >> > >>>> > >>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter(); > >>>>>>> > >>>>>>> > >>>>>>> > >> > parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler, > >> > >>>> > >>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> errReporter); > >>>>>> > >>>>>> > >>>>>> > >>>>>>> > >>>>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter(); > >>>>>>> parser->setFilter(pDOMParserFilter); > >>>>>>> > >>>>>>> > >>>>>>> // > >>>>>>> // Parse the XML file, catching any XML exceptions that might > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> propogate > >>>>>> > >>>>>> > >>>>>> > >>>>>>> // out of it. > >>>>>>> // > >>>>>>> bool errorsOccured = false; > >>>>>>> DOMDocument *doc = NULL; > >>>>>>> > >>>>>>> try > >>>>>>> { > >>>>>>> doc = parser->parseURI(gXmlFile); > >>>>>>> } > >>>>>>> catch (const OutOfMemoryException&) > >>>>>>> { > >>>>>>> XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> XERCES_STD_QUALIFIER endl; > >>>>>> > >>>>>> > >>>>>> > >>>>>>> errorsOccured = true; > >>>>>>> } > >>>>>>> catch (const XMLException& e) > >>>>>>> { > >>>>>>> XERCES_STD_QUALIFIER cerr << "An error occurred during > >>>>>>> > >>>>>>> > >>>> parsing\n > >>>> > >>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> Message: " > >>>>>> > >>>>>> > >>>>>> > >>>>>>> << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl; > >>>>>>> errorsOccured = true; > >>>>>>> } > >>>>>>> > >>>>>>> catch (const DOMException& e) > >>>>>>> { > >>>>>>> const unsigned int maxChars = 2047; > >>>>>>> XMLCh errText[maxChars + 1]; > >>>>>>> > >>>>>>> XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" > >>>>>>> > >> << > >> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> gXmlFile << "'\n" > >>>>>> > >>>>>> > >>>>>> > >>>>>>> << "DOMException code is: " << e.code << > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> XERCES_STD_QUALIFIER endl; > >>>>>> > >>>>>> > >>>>>> > >>>>>>> if (DOMImplementation::loadDOMExceptionMsg(e.code, errText, > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> maxChars)) > >>>>>> > >>>>>> > >>>>>> > >>>>>>> XERCES_STD_QUALIFIER cerr << "Message is: " << > >>>>>>> > >>>>>>> > >>>> StrX(errText) > >>>> > >>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> << XERCES_STD_QUALIFIER endl; > >>>>>> > >>>>>> > >>>>>> > >>>>>>> errorsOccured = true; > >>>>>>> } > >>>>>>> > >>>>>>> catch (...) > >>>>>>> { > >>>>>>> XERCES_STD_QUALIFIER cerr << "An error occurred during > >>>>>>> > >>>>>>> > >>>> parsing\n > >>>> > >>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> " << XERCES_STD_QUALIFIER endl; > >>>>>> > >>>>>> > >>>>>> > >>>>>>> errorsOccured = true; > >>>>>>> } > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>>>> > >>>>> > >>> > >>> > > > > |
|
|
Re: method startElement() from class DOMLSParserFilterMirko Braun wrote:
> Hi Alberto, > > thank you very much for your help. I integrated the patch in > 3.0.1 and it worked. There is no exception any more. > But there is still one problem. The usage of memory is still > of the same size. I think if a node is rejected from the tree > the usage of memory should also decrease. Is my conclusion > correct? > Yes, if a node is rejected is should be marked for recycling; how much memory are you seeing is been used? Alberto > Mirko > > -------- Original-Nachricht -------- > >> Datum: Fri, 04 Sep 2009 16:12:16 +0200 >> Von: Alberto Massari <amassari@...> >> An: c-users@... >> Betreff: Re: method startElement() from class DOMLSParserFilter >> > > >> In effect I am seeing so many problems with that code that the only >> suggestion I have is to get the latest 3.0 from the trunk and work with >> what I have just committed (or get the patch from >> http://svn.apache.org/viewvc?rev=811420&view=rev and apply to the 3.0.1 >> code). This version should support your original code. >> >> Alberto >> >> >> Mirko Braun wrote: >> >>> Hi Alberto, >>> >>> yes, i'm still using the method startElement(). Is it better >>> to use the method acceptNode() to reject the DATA node from >>> the DOM or is there any other possibility? >>> >>> Mirko >>> >>> >>> -------- Original-Nachricht -------- >>> >>> >>>> Datum: Fri, 04 Sep 2009 15:41:54 +0200 >>>> Von: Alberto Massari <amassari@...> >>>> An: c-users@... >>>> Betreff: Re: method startElement() from class DOMLSParserFilter >>>> >>>> >>> >>> >>>> Hi Mirko, >>>> are you still using startElement()? That API would mess with the >>>> >> current >> >>>> parent, so it would break the parsing at a certain point. >>>> >>>> Alberto >>>> >>>> Mirko Braun wrote: >>>> >>>> >>>>> Hi Alberto, >>>>> >>>>> yes i'm sure that DATA is not a root node. I debugged a little bit. >>>>> The exception occurs after the sixth time this DATA node was found. >>>>> >>>>> Mirko >>>>> >>>>> -------- Original-Nachricht -------- >>>>> >>>>> >>>>> >>>>>> Datum: Fri, 04 Sep 2009 14:21:15 +0200 >>>>>> Von: Alberto Massari <amassari@...> >>>>>> An: c-users@... >>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>>> Hi Mirko, >>>>>> are you sure that your root node isn't one of those DATA elements? In >>>>>> this case the document node would see more than one root element. >>>>>> >>>>>> Alberto >>>>>> >>>>>> Mirko Braun wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Hi Alberto, >>>>>>> >>>>>>> thank you for you answer. I integrated the changes you >>>>>>> suggested, but the result is still the same: >>>>>>> >>>>>>> DOM Error during parsing: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' >> >>>> >>>> >>>>>> >>>>>> >>>>>> >>>>>>> DOMException code is: 3 >>>>>>> Message is: attempt is made to insert a node where it is not >>>>>>> >> permitted >> >>>>>>> Best regards, >>>>>>> Mirko >>>>>>> >>>>>>> -------- Original-Nachricht -------- >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200 >>>>>>>> Von: Alberto Massari <amassari@...> >>>>>>>> An: c-users@... >>>>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Hi Mirko, >>>>>>>> I think the current implementation of the DOMLSParserFilter doesn't >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> work >>>>>> >>>>>> >>>>>> >>>>>>>> nicely with your code, as the rejected nodes are not recycled and >>>>>>>> >> the >> >>>>>>>> memory will grow to the same level as before. >>>>>>>> Anyhow, you should instead override acceptNode like this: >>>>>>>> >>>>>>>> DOMParserFilter::FilterAction >>>>>>>> >> DOMParserFilter::acceptNode(DOMElement* >> >>>>>>>> node) >>>>>>>> { >>>>>>>> // for element whose name is "DATA", skip it >>>>>>>> if (node->getNodeType()==DOMNode::ELEMENT_NODE && >>>>>>>> XMLString::compareString(node->getNodeName(), element_data)==0) >>>>>>>> return DOMParserFilter::FILTER_REJECT; >>>>>>>> else >>>>>>>> return DOMParserFilter::FILTER_ACCEPT; >>>>>>>> } >>>>>>>> >>>>>>>> Then, change DOMLSParserImpl::endElement to add a call to >>>>>>>> origNode->release() after the call to removeChild(). >>>>>>>> >>>>>>>> Alberto >>>>>>>> >>>>>>>> >>>>>>>> Mirko Braun wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Hello everybody, >>>>>>>>> >>>>>>>>> i would like to parse a quite large XML file (about 180 MB). >>>>>>>>> I used the DOM interface because i need the tree for further >>>>>>>>> processing of the data the xml file contains. Of course there >>>>>>>>> is a lot of memory used during parsing the file and i got an >>>>>>>>> "Out of memory" exception. >>>>>>>>> >>>>>>>>> I noticed that a class DOMLSParserFilter comes along wiht Xercesc >>>>>>>>> >>>>>>>>> >>>> C++ >>>> >>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> 3.0.1 (Win32), which makes it possible to filter the Nodes during >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> parsing. >>>>>> >>>>>> >>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> That is perfect for me because one XML-Element in my large file >>>>>>>>> contains most of the data. This XML-Element is called DATA and >>>>>>>>> appears serveral time in my XML file. >>>>>>>>> So i had the idea to reject this XML-Element from the DOM tree >>>>>>>>> during parsing to reduce the used memory by using the method >>>>>>>>> startElement() of the DOMLSParserFilter class. After that i would >>>>>>>>> use a SAX parser and just get all XML-Elements DATA with their >>>>>>>>> >>>>>>>>> >>>> values. >>>> >>>> >>>>>>>>> But it does not work. >>>>>>>>> I integregated my code into the DOMPrint example which comes along >>>>>>>>> with Xercesc C++ 3.0.1. The following error message occurred: >>>>>>>>> >>>>>>>>> DOM Error during parsing: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' >> >>>> >>>> >>>>>> >>>>>> >>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> DOMException code is: 3 >>>>>>>>> Message is: attempt is made to insert a node where it is not >>>>>>>>> >>>>>>>>> >>>> permitted >>>> >>>> >>>>>>>>> Did i misunderstand the functionality of the DOMLSParserFilter >>>>>>>>> >> class >> >>>>>>>>> and its method startElement? >>>>>>>>> It is possible to realize my idea with the help of this class? Did >>>>>>>>> i something wrong with in my code (please have a look below)? >>>>>>>>> >>>>>>>>> I would be very grateful for any help. >>>>>>>>> >>>>>>>>> Thanks in advanced, >>>>>>>>> Mirko >>>>>>>>> >>>>>>>>> >>>>>>>>> DOMPrintFilter.hpp: >>>>>>>>> -------------------- >>>>>>>>> >>>>>>>>> >>>>>>>>> class DOMParserFilter : public DOMLSParserFilter { >>>>>>>>> public: >>>>>>>>> >>>>>>>>> DOMParserFilter(DOMNodeFilter::ShowType whatToShow = >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> DOMNodeFilter::SHOW_ALL); >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> ~DOMParserFilter(){}; >>>>>>>>> >>>>>>>>> virtual FilterAction startElement(DOMElement* node); >>>>>>>>> virtual FilterAction acceptNode(DOMNode* node){return >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> DOMParserFilter::FILTER_ACCEPT;}; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> virtual DOMNodeFilter::ShowType getWhatToShow() const {return >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> fWhatToShow;}; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> private: >>>>>>>>> DOMNodeFilter::ShowType fWhatToShow; >>>>>>>>> }; >>>>>>>>> >>>>>>>>> >>>>>>>>> DOMPrintFilter.cpp: >>>>>>>>> -------------------- >>>>>>>>> >>>>>>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType >>>>>>>>> >> whatToShow) >> >>>>>>>>> :fWhatToShow(whatToShow) >>>>>>>>> {} >>>>>>>>> >>>>>>>>> DOMParserFilter::FilterAction >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>> DOMParserFilter::startElement(DOMElement* >>>>>> >>>>>> >>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> node) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> { >>>>>>>>> // for element whose name is "DATA", skip it >>>>>>>>> if (XMLString::compareString(node->getNodeName(), >>>>>>>>> >>>>>>>>> >>>> element_data)==0) >>>> >>>> >>>>>>>>> return DOMParserFilter::FILTER_REJECT; >>>>>>>>> else >>>>>>>>> return DOMParserFilter::FILTER_ACCEPT; >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> DOMPrint.cpp: >>>>>>>>> --------------- >>>>>>>>> >>>>>>>>> static const XMLCh gLS[] = { xercesc::chLatin_L, >>>>>>>>> >> xercesc::chLatin_S, >> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> xercesc::chNull }; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> xercesc::DOMImplementation *implParser = >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS); >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> xercesc::DOMLSParser* parser = >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0); >> >>>> >>>> >>>>>> >>>>>> >>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter(); >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler, >> >>>> >>>> >>>>>> >>>>>> >>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> errReporter); >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter(); >>>>>>>>> parser->setFilter(pDOMParserFilter); >>>>>>>>> >>>>>>>>> >>>>>>>>> // >>>>>>>>> // Parse the XML file, catching any XML exceptions that might >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> propogate >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> // out of it. >>>>>>>>> // >>>>>>>>> bool errorsOccured = false; >>>>>>>>> DOMDocument *doc = NULL; >>>>>>>>> >>>>>>>>> try >>>>>>>>> { >>>>>>>>> doc = parser->parseURI(gXmlFile); >>>>>>>>> } >>>>>>>>> catch (const OutOfMemoryException&) >>>>>>>>> { >>>>>>>>> XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> XERCES_STD_QUALIFIER endl; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> errorsOccured = true; >>>>>>>>> } >>>>>>>>> catch (const XMLException& e) >>>>>>>>> { >>>>>>>>> XERCES_STD_QUALIFIER cerr << "An error occurred during >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>> parsing\n >>>>>> >>>>>> >>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> Message: " >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl; >>>>>>>>> errorsOccured = true; >>>>>>>>> } >>>>>>>>> >>>>>>>>> catch (const DOMException& e) >>>>>>>>> { >>>>>>>>> const unsigned int maxChars = 2047; >>>>>>>>> XMLCh errText[maxChars + 1]; >>>>>>>>> >>>>>>>>> XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" >>>>>>>>> >>>>>>>>> >>>> << >>>> >>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> gXmlFile << "'\n" >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> << "DOMException code is: " << e.code << >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> XERCES_STD_QUALIFIER endl; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> if (DOMImplementation::loadDOMExceptionMsg(e.code, errText, >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> maxChars)) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> XERCES_STD_QUALIFIER cerr << "Message is: " << >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>> StrX(errText) >>>>>> >>>>>> >>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> << XERCES_STD_QUALIFIER endl; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> errorsOccured = true; >>>>>>>>> } >>>>>>>>> >>>>>>>>> catch (...) >>>>>>>>> { >>>>>>>>> XERCES_STD_QUALIFIER cerr << "An error occurred during >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>> parsing\n >>>>>> >>>>>> >>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> " << XERCES_STD_QUALIFIER endl; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> errorsOccured = true; >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>>> >>> >>> > > |
|
|
Re: method startElement() from class DOMLSParserFilterSorry, I don't know how much memory is used. I just had a look at the maximum used memory in the task manager (Window XP). It doesn't matter if i used a DOMLSParserFilter or not the process DOMPrint.exe used the same size of memory. The XML-Elements DATA which i want to reject have very large values and i think if i reject these nodes they are also removed from memory. Does "be marked for recycling" mean, that these DATA nodes remain in memory? Mirko -------- Original-Nachricht -------- > Datum: Mon, 07 Sep 2009 09:26:05 +0200 > Von: Alberto Massari <amassari@...> > An: c-users@... > Betreff: Re: method startElement() from class DOMLSParserFilter > Mirko Braun wrote: > > Hi Alberto, > > > > thank you very much for your help. I integrated the patch in > > 3.0.1 and it worked. There is no exception any more. > > But there is still one problem. The usage of memory is still > > of the same size. I think if a node is rejected from the tree > > the usage of memory should also decrease. Is my conclusion > > correct? > > > > Yes, if a node is rejected is should be marked for recycling; how much > memory are you seeing is been used? > > Alberto > > > Mirko > > > > -------- Original-Nachricht -------- > > > >> Datum: Fri, 04 Sep 2009 16:12:16 +0200 > >> Von: Alberto Massari <amassari@...> > >> An: c-users@... > >> Betreff: Re: method startElement() from class DOMLSParserFilter > >> > > > > > >> In effect I am seeing so many problems with that code that the only > >> suggestion I have is to get the latest 3.0 from the trunk and work with > >> what I have just committed (or get the patch from > >> http://svn.apache.org/viewvc?rev=811420&view=rev and apply to the 3.0.1 > >> code). This version should support your original code. > >> > >> Alberto > >> > >> > >> Mirko Braun wrote: > >> > >>> Hi Alberto, > >>> > >>> yes, i'm still using the method startElement(). Is it better > >>> to use the method acceptNode() to reject the DATA node from > >>> the DOM or is there any other possibility? > >>> > >>> Mirko > >>> > >>> > >>> -------- Original-Nachricht -------- > >>> > >>> > >>>> Datum: Fri, 04 Sep 2009 15:41:54 +0200 > >>>> Von: Alberto Massari <amassari@...> > >>>> An: c-users@... > >>>> Betreff: Re: method startElement() from class DOMLSParserFilter > >>>> > >>>> > >>> > >>> > >>>> Hi Mirko, > >>>> are you still using startElement()? That API would mess with the > >>>> > >> current > >> > >>>> parent, so it would break the parsing at a certain point. > >>>> > >>>> Alberto > >>>> > >>>> Mirko Braun wrote: > >>>> > >>>> > >>>>> Hi Alberto, > >>>>> > >>>>> yes i'm sure that DATA is not a root node. I debugged a little bit. > >>>>> The exception occurs after the sixth time this DATA node was found. > >>>>> > >>>>> Mirko > >>>>> > >>>>> -------- Original-Nachricht -------- > >>>>> > >>>>> > >>>>> > >>>>>> Datum: Fri, 04 Sep 2009 14:21:15 +0200 > >>>>>> Von: Alberto Massari <amassari@...> > >>>>>> An: c-users@... > >>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>>> Hi Mirko, > >>>>>> are you sure that your root node isn't one of those DATA elements? > In > >>>>>> this case the document node would see more than one root element. > >>>>>> > >>>>>> Alberto > >>>>>> > >>>>>> Mirko Braun wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>> Hi Alberto, > >>>>>>> > >>>>>>> thank you for you answer. I integrated the changes you > >>>>>>> suggested, but the result is still the same: > >>>>>>> > >>>>>>> DOM Error during parsing: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >> > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > >> > >>>> > >>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> DOMException code is: 3 > >>>>>>> Message is: attempt is made to insert a node where it is not > >>>>>>> > >> permitted > >> > >>>>>>> Best regards, > >>>>>>> Mirko > >>>>>>> > >>>>>>> -------- Original-Nachricht -------- > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200 > >>>>>>>> Von: Alberto Massari <amassari@...> > >>>>>>>> An: c-users@... > >>>>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> Hi Mirko, > >>>>>>>> I think the current implementation of the DOMLSParserFilter > doesn't > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>> work > >>>>>> > >>>>>> > >>>>>> > >>>>>>>> nicely with your code, as the rejected nodes are not recycled and > >>>>>>>> > >> the > >> > >>>>>>>> memory will grow to the same level as before. > >>>>>>>> Anyhow, you should instead override acceptNode like this: > >>>>>>>> > >>>>>>>> DOMParserFilter::FilterAction > >>>>>>>> > >> DOMParserFilter::acceptNode(DOMElement* > >> > >>>>>>>> node) > >>>>>>>> { > >>>>>>>> // for element whose name is "DATA", skip it > >>>>>>>> if (node->getNodeType()==DOMNode::ELEMENT_NODE && > >>>>>>>> XMLString::compareString(node->getNodeName(), element_data)==0) > >>>>>>>> return DOMParserFilter::FILTER_REJECT; > >>>>>>>> else > >>>>>>>> return DOMParserFilter::FILTER_ACCEPT; > >>>>>>>> } > >>>>>>>> > >>>>>>>> Then, change DOMLSParserImpl::endElement to add a call to > >>>>>>>> origNode->release() after the call to removeChild(). > >>>>>>>> > >>>>>>>> Alberto > >>>>>>>> > >>>>>>>> > >>>>>>>> Mirko Braun wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> Hello everybody, > >>>>>>>>> > >>>>>>>>> i would like to parse a quite large XML file (about 180 MB). > >>>>>>>>> I used the DOM interface because i need the tree for further > >>>>>>>>> processing of the data the xml file contains. Of course there > >>>>>>>>> is a lot of memory used during parsing the file and i got an > >>>>>>>>> "Out of memory" exception. > >>>>>>>>> > >>>>>>>>> I noticed that a class DOMLSParserFilter comes along wiht > Xercesc > >>>>>>>>> > >>>>>>>>> > >>>> C++ > >>>> > >>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> 3.0.1 (Win32), which makes it possible to filter the Nodes during > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>> parsing. > >>>>>> > >>>>>> > >>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> That is perfect for me because one XML-Element in my large file > >>>>>>>>> contains most of the data. This XML-Element is called DATA and > >>>>>>>>> appears serveral time in my XML file. > >>>>>>>>> So i had the idea to reject this XML-Element from the DOM tree > >>>>>>>>> during parsing to reduce the used memory by using the method > >>>>>>>>> startElement() of the DOMLSParserFilter class. After that i > would > >>>>>>>>> use a SAX parser and just get all XML-Elements DATA with their > >>>>>>>>> > >>>>>>>>> > >>>> values. > >>>> > >>>> > >>>>>>>>> But it does not work. > >>>>>>>>> I integregated my code into the DOMPrint example which comes > along > >>>>>>>>> with Xercesc C++ 3.0.1. The following error message occurred: > >>>>>>>>> > >>>>>>>>> DOM Error during parsing: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >> > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > >> > >>>> > >>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> DOMException code is: 3 > >>>>>>>>> Message is: attempt is made to insert a node where it is not > >>>>>>>>> > >>>>>>>>> > >>>> permitted > >>>> > >>>> > >>>>>>>>> Did i misunderstand the functionality of the DOMLSParserFilter > >>>>>>>>> > >> class > >> > >>>>>>>>> and its method startElement? > >>>>>>>>> It is possible to realize my idea with the help of this class? > Did > >>>>>>>>> i something wrong with in my code (please have a look below)? > >>>>>>>>> > >>>>>>>>> I would be very grateful for any help. > >>>>>>>>> > >>>>>>>>> Thanks in advanced, > >>>>>>>>> Mirko > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> DOMPrintFilter.hpp: > >>>>>>>>> -------------------- > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> class DOMParserFilter : public DOMLSParserFilter { > >>>>>>>>> public: > >>>>>>>>> > >>>>>>>>> DOMParserFilter(DOMNodeFilter::ShowType whatToShow = > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> DOMNodeFilter::SHOW_ALL); > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> ~DOMParserFilter(){}; > >>>>>>>>> > >>>>>>>>> virtual FilterAction startElement(DOMElement* node); > >>>>>>>>> virtual FilterAction acceptNode(DOMNode* node){return > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> DOMParserFilter::FILTER_ACCEPT;}; > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> virtual DOMNodeFilter::ShowType getWhatToShow() const > {return > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> fWhatToShow;}; > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> private: > >>>>>>>>> DOMNodeFilter::ShowType fWhatToShow; > >>>>>>>>> }; > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> DOMPrintFilter.cpp: > >>>>>>>>> -------------------- > >>>>>>>>> > >>>>>>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType > >>>>>>>>> > >> whatToShow) > >> > >>>>>>>>> :fWhatToShow(whatToShow) > >>>>>>>>> {} > >>>>>>>>> > >>>>>>>>> DOMParserFilter::FilterAction > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>> DOMParserFilter::startElement(DOMElement* > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> node) > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> { > >>>>>>>>> // for element whose name is "DATA", skip it > >>>>>>>>> if (XMLString::compareString(node->getNodeName(), > >>>>>>>>> > >>>>>>>>> > >>>> element_data)==0) > >>>> > >>>> > >>>>>>>>> return DOMParserFilter::FILTER_REJECT; > >>>>>>>>> else > >>>>>>>>> return DOMParserFilter::FILTER_ACCEPT; > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> DOMPrint.cpp: > >>>>>>>>> --------------- > >>>>>>>>> > >>>>>>>>> static const XMLCh gLS[] = { xercesc::chLatin_L, > >>>>>>>>> > >> xercesc::chLatin_S, > >> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> xercesc::chNull }; > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> xercesc::DOMImplementation *implParser = > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS); > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> xercesc::DOMLSParser* parser = > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >> > ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0); > >> > >>>> > >>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter(); > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >> > parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler, > >> > >>>> > >>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> errReporter); > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> > >>>>>>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter(); > >>>>>>>>> parser->setFilter(pDOMParserFilter); > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> // > >>>>>>>>> // Parse the XML file, catching any XML exceptions that > might > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> propogate > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> // out of it. > >>>>>>>>> // > >>>>>>>>> bool errorsOccured = false; > >>>>>>>>> DOMDocument *doc = NULL; > >>>>>>>>> > >>>>>>>>> try > >>>>>>>>> { > >>>>>>>>> doc = parser->parseURI(gXmlFile); > >>>>>>>>> } > >>>>>>>>> catch (const OutOfMemoryException&) > >>>>>>>>> { > >>>>>>>>> XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> XERCES_STD_QUALIFIER endl; > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> errorsOccured = true; > >>>>>>>>> } > >>>>>>>>> catch (const XMLException& e) > >>>>>>>>> { > >>>>>>>>> XERCES_STD_QUALIFIER cerr << "An error occurred during > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>> parsing\n > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> Message: " > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> << StrX(e.getMessage()) << XERCES_STD_QUALIFIER > endl; > >>>>>>>>> errorsOccured = true; > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> catch (const DOMException& e) > >>>>>>>>> { > >>>>>>>>> const unsigned int maxChars = 2047; > >>>>>>>>> XMLCh errText[maxChars + 1]; > >>>>>>>>> > >>>>>>>>> XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: > '" > >>>>>>>>> > >>>>>>>>> > >>>> << > >>>> > >>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> gXmlFile << "'\n" > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> << "DOMException code is: " << e.code << > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> XERCES_STD_QUALIFIER endl; > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> if (DOMImplementation::loadDOMExceptionMsg(e.code, > errText, > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> maxChars)) > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> XERCES_STD_QUALIFIER cerr << "Message is: " << > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>> StrX(errText) > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> << XERCES_STD_QUALIFIER endl; > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> errorsOccured = true; > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> catch (...) > >>>>>>>>> { > >>>>>>>>> XERCES_STD_QUALIFIER cerr << "An error occurred during > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>> parsing\n > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> " << XERCES_STD_QUALIFIER endl; > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> errorsOccured = true; > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>>>> > >>>>> > >>> > >>> > > > > |
|
|
Re: method startElement() from class DOMLSParserFilterWhen you call release() on a node, the node is not deleted (as its
memory comes from a pool that can be deleted as a whole) but it's placed in a "recycle bin" from where it is taken when a new node of the same type is requested. So, the next element will not allocate extra memory, but reuse that node. What I need to check is if node texts do the same with the buffer used to keep the node value, and how they are recycled (i.e. if the big buffer used by DATA nodes is reused for a much smaller node) Alberto Mirko Braun wrote: > Sorry, I don't know how much memory is used. I just had a look at the > maximum used memory in the task manager (Window XP). It doesn't > matter if i used a DOMLSParserFilter or not the process DOMPrint.exe used the same size of memory. > The XML-Elements DATA which i want to reject have very large values > and i think if i reject these nodes they are also removed from > memory. Does "be marked for recycling" mean, that these DATA nodes > remain in memory? > > Mirko > > -------- Original-Nachricht -------- > >> Datum: Mon, 07 Sep 2009 09:26:05 +0200 >> Von: Alberto Massari <amassari@...> >> An: c-users@... >> Betreff: Re: method startElement() from class DOMLSParserFilter >> > > >> Mirko Braun wrote: >> >>> Hi Alberto, >>> >>> thank you very much for your help. I integrated the patch in >>> 3.0.1 and it worked. There is no exception any more. >>> But there is still one problem. The usage of memory is still >>> of the same size. I think if a node is rejected from the tree >>> the usage of memory should also decrease. Is my conclusion >>> correct? >>> >>> >> Yes, if a node is rejected is should be marked for recycling; how much >> memory are you seeing is been used? >> >> Alberto >> >> >>> Mirko >>> >>> -------- Original-Nachricht -------- >>> >>> >>>> Datum: Fri, 04 Sep 2009 16:12:16 +0200 >>>> Von: Alberto Massari <amassari@...> >>>> An: c-users@... >>>> Betreff: Re: method startElement() from class DOMLSParserFilter >>>> >>>> >>> >>> >>>> In effect I am seeing so many problems with that code that the only >>>> suggestion I have is to get the latest 3.0 from the trunk and work with >>>> what I have just committed (or get the patch from >>>> http://svn.apache.org/viewvc?rev=811420&view=rev and apply to the 3.0.1 >>>> code). This version should support your original code. >>>> >>>> Alberto >>>> >>>> >>>> Mirko Braun wrote: >>>> >>>> >>>>> Hi Alberto, >>>>> >>>>> yes, i'm still using the method startElement(). Is it better >>>>> to use the method acceptNode() to reject the DATA node from >>>>> the DOM or is there any other possibility? >>>>> >>>>> Mirko >>>>> >>>>> >>>>> -------- Original-Nachricht -------- >>>>> >>>>> >>>>> >>>>>> Datum: Fri, 04 Sep 2009 15:41:54 +0200 >>>>>> Von: Alberto Massari <amassari@...> >>>>>> An: c-users@... >>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>>> Hi Mirko, >>>>>> are you still using startElement()? That API would mess with the >>>>>> >>>>>> >>>> current >>>> >>>> >>>>>> parent, so it would break the parsing at a certain point. >>>>>> >>>>>> Alberto >>>>>> >>>>>> Mirko Braun wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Hi Alberto, >>>>>>> >>>>>>> yes i'm sure that DATA is not a root node. I debugged a little bit. >>>>>>> The exception occurs after the sixth time this DATA node was found. >>>>>>> >>>>>>> Mirko >>>>>>> >>>>>>> -------- Original-Nachricht -------- >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Datum: Fri, 04 Sep 2009 14:21:15 +0200 >>>>>>>> Von: Alberto Massari <amassari@...> >>>>>>>> An: c-users@... >>>>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Hi Mirko, >>>>>>>> are you sure that your root node isn't one of those DATA elements? >>>>>>>> >> In >> >>>>>>>> this case the document node would see more than one root element. >>>>>>>> >>>>>>>> Alberto >>>>>>>> >>>>>>>> Mirko Braun wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Hi Alberto, >>>>>>>>> >>>>>>>>> thank you for you answer. I integrated the changes you >>>>>>>>> suggested, but the result is still the same: >>>>>>>>> >>>>>>>>> DOM Error during parsing: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' >> >>>> >>>> >>>>>> >>>>>> >>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> DOMException code is: 3 >>>>>>>>> Message is: attempt is made to insert a node where it is not >>>>>>>>> >>>>>>>>> >>>> permitted >>>> >>>> >>>>>>>>> Best regards, >>>>>>>>> Mirko >>>>>>>>> >>>>>>>>> -------- Original-Nachricht -------- >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200 >>>>>>>>>> Von: Alberto Massari <amassari@...> >>>>>>>>>> An: c-users@... >>>>>>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Hi Mirko, >>>>>>>>>> I think the current implementation of the DOMLSParserFilter >>>>>>>>>> >> doesn't >> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> work >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>> nicely with your code, as the rejected nodes are not recycled and >>>>>>>>>> >>>>>>>>>> >>>> the >>>> >>>> >>>>>>>>>> memory will grow to the same level as before. >>>>>>>>>> Anyhow, you should instead override acceptNode like this: >>>>>>>>>> >>>>>>>>>> DOMParserFilter::FilterAction >>>>>>>>>> >>>>>>>>>> >>>> DOMParserFilter::acceptNode(DOMElement* >>>> >>>> >>>>>>>>>> node) >>>>>>>>>> { >>>>>>>>>> // for element whose name is "DATA", skip it >>>>>>>>>> if (node->getNodeType()==DOMNode::ELEMENT_NODE && >>>>>>>>>> XMLString::compareString(node->getNodeName(), element_data)==0) >>>>>>>>>> return DOMParserFilter::FILTER_REJECT; >>>>>>>>>> else >>>>>>>>>> return DOMParserFilter::FILTER_ACCEPT; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> Then, change DOMLSParserImpl::endElement to add a call to >>>>>>>>>> origNode->release() after the call to removeChild(). >>>>>>>>>> >>>>>>>>>> Alberto >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Mirko Braun wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Hello everybody, >>>>>>>>>>> >>>>>>>>>>> i would like to parse a quite large XML file (about 180 MB). >>>>>>>>>>> I used the DOM interface because i need the tree for further >>>>>>>>>>> processing of the data the xml file contains. Of course there >>>>>>>>>>> is a lot of memory used during parsing the file and i got an >>>>>>>>>>> "Out of memory" exception. >>>>>>>>>>> >>>>>>>>>>> I noticed that a class DOMLSParserFilter comes along wiht >>>>>>>>>>> >> Xercesc >> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>> C++ >>>>>> >>>>>> >>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> 3.0.1 (Win32), which makes it possible to filter the Nodes during >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> parsing. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> That is perfect for me because one XML-Element in my large file >>>>>>>>>>> contains most of the data. This XML-Element is called DATA and >>>>>>>>>>> appears serveral time in my XML file. >>>>>>>>>>> So i had the idea to reject this XML-Element from the DOM tree >>>>>>>>>>> during parsing to reduce the used memory by using the method >>>>>>>>>>> startElement() of the DOMLSParserFilter class. After that i >>>>>>>>>>> >> would >> >>>>>>>>>>> use a SAX parser and just get all XML-Elements DATA with their >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>> values. >>>>>> >>>>>> >>>>>> >>>>>>>>>>> But it does not work. >>>>>>>>>>> I integregated my code into the DOMPrint example which comes >>>>>>>>>>> >> along >> >>>>>>>>>>> with Xercesc C++ 3.0.1. The following error message occurred: >>>>>>>>>>> >>>>>>>>>>> DOM Error during parsing: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' >> >>>> >>>> >>>>>> >>>>>> >>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> DOMException code is: 3 >>>>>>>>>>> Message is: attempt is made to insert a node where it is not >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>> permitted >>>>>> >>>>>> >>>>>> >>>>>>>>>>> Did i misunderstand the functionality of the DOMLSParserFilter >>>>>>>>>>> >>>>>>>>>>> >>>> class >>>> >>>> >>>>>>>>>>> and its method startElement? >>>>>>>>>>> It is possible to realize my idea with the help of this class? >>>>>>>>>>> >> Did >> >>>>>>>>>>> i something wrong with in my code (please have a look below)? >>>>>>>>>>> >>>>>>>>>>> I would be very grateful for any help. >>>>>>>>>>> >>>>>>>>>>> Thanks in advanced, >>>>>>>>>>> Mirko >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> DOMPrintFilter.hpp: >>>>>>>>>>> -------------------- >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> class DOMParserFilter : public DOMLSParserFilter { >>>>>>>>>>> public: >>>>>>>>>>> >>>>>>>>>>> DOMParserFilter(DOMNodeFilter::ShowType whatToShow = >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> DOMNodeFilter::SHOW_ALL); >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> ~DOMParserFilter(){}; >>>>>>>>>>> >>>>>>>>>>> virtual FilterAction startElement(DOMElement* node); >>>>>>>>>>> virtual FilterAction acceptNode(DOMNode* node){return >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> DOMParserFilter::FILTER_ACCEPT;}; >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> virtual DOMNodeFilter::ShowType getWhatToShow() const >>>>>>>>>>> >> {return >> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> fWhatToShow;}; >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> private: >>>>>>>>>>> DOMNodeFilter::ShowType fWhatToShow; >>>>>>>>>>> }; >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> DOMPrintFilter.cpp: >>>>>>>>>>> -------------------- >>>>>>>>>>> >>>>>>>>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType >>>>>>>>>>> >>>>>>>>>>> >>>> whatToShow) >>>> >>>> >>>>>>>>>>> :fWhatToShow(whatToShow) >>>>>>>>>>> {} >>>>>>>>>>> >>>>>>>>>>> DOMParserFilter::FilterAction >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> DOMParserFilter::startElement(DOMElement* >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> node) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> { >>>>>>>>>>> // for element whose name is "DATA", skip it >>>>>>>>>>> if (XMLString::compareString(node->getNodeName(), >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>> element_data)==0) >>>>>> >>>>>> >>>>>> >>>>>>>>>>> return DOMParserFilter::FILTER_REJECT; >>>>>>>>>>> else >>>>>>>>>>> return DOMParserFilter::FILTER_ACCEPT; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> DOMPrint.cpp: >>>>>>>>>>> --------------- >>>>>>>>>>> >>>>>>>>>>> static const XMLCh gLS[] = { xercesc::chLatin_L, >>>>>>>>>>> >>>>>>>>>>> >>>> xercesc::chLatin_S, >>>> >>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> xercesc::chNull }; >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> xercesc::DOMImplementation *implParser = >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS); >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> xercesc::DOMLSParser* parser = >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0); >> >>>> >>>> >>>>>> >>>>>> >>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter(); >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler, >> >>>> >>>> >>>>>> >>>>>> >>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> errReporter); >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter(); >>>>>>>>>>> parser->setFilter(pDOMParserFilter); >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> // >>>>>>>>>>> // Parse the XML file, catching any XML exceptions that >>>>>>>>>>> >> might >> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> propogate >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> // out of it. >>>>>>>>>>> // >>>>>>>>>>> bool errorsOccured = false; >>>>>>>>>>> DOMDocument *doc = NULL; >>>>>>>>>>> >>>>>>>>>>> try >>>>>>>>>>> { >>>>>>>>>>> doc = parser->parseURI(gXmlFile); >>>>>>>>>>> } >>>>>>>>>>> catch (const OutOfMemoryException&) >>>>>>>>>>> { >>>>>>>>>>> XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> XERCES_STD_QUALIFIER endl; >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> errorsOccured = true; >>>>>>>>>>> } >>>>>>>>>>> catch (const XMLException& e) >>>>>>>>>>> { >>>>>>>>>>> XERCES_STD_QUALIFIER cerr << "An error occurred during >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> parsing\n >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Message: " >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> << StrX(e.getMessage()) << XERCES_STD_QUALIFIER >>>>>>>>>>> >> endl; >> >>>>>>>>>>> errorsOccured = true; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> catch (const DOMException& e) >>>>>>>>>>> { >>>>>>>>>>> const unsigned int maxChars = 2047; >>>>>>>>>>> XMLCh errText[maxChars + 1]; >>>>>>>>>>> >>>>>>>>>>> XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: >>>>>>>>>>> >> '" >> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>> << >>>>>> >>>>>> >>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> gXmlFile << "'\n" >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> << "DOMException code is: " << e.code << >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> XERCES_STD_QUALIFIER endl; >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> if (DOMImplementation::loadDOMExceptionMsg(e.code, >>>>>>>>>>> >> errText, >> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> maxChars)) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> XERCES_STD_QUALIFIER cerr << "Message is: " << >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> StrX(errText) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> << XERCES_STD_QUALIFIER endl; >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> errorsOccured = true; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> catch (...) >>>>>>>>>>> { >>>>>>>>>>> XERCES_STD_QUALIFIER cerr << "An error occurred during >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> parsing\n >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> " << XERCES_STD_QUALIFIER endl; >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> errorsOccured = true; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>>> >>> >>> > > |
|
|
Re: method startElement() from class DOMLSParserFilterHi Alberto, did you have the time to check "if node texts do the same with the buffer used to keep the node value, and how they are recycled (i.e. if the big buffer used by DATA nodes is reused for a much smaller node)"? Mirko -------- Original-Nachricht -------- > Datum: Tue, 08 Sep 2009 09:37:52 +0200 > Von: Alberto Massari <amassari@...> > An: c-users@... > Betreff: Re: method startElement() from class DOMLSParserFilter > When you call release() on a node, the node is not deleted (as its > memory comes from a pool that can be deleted as a whole) but it's placed > in a "recycle bin" from where it is taken when a new node of the same > type is requested. So, the next element will not allocate extra memory, > but reuse that node. What I need to check is if node texts do the same > with the buffer used to keep the node value, and how they are recycled > (i.e. if the big buffer used by DATA nodes is reused for a much smaller > node) > > Alberto > > Mirko Braun wrote: > > Sorry, I don't know how much memory is used. I just had a look at the > > maximum used memory in the task manager (Window XP). It doesn't > > matter if i used a DOMLSParserFilter or not the process DOMPrint.exe > used the same size of memory. > > The XML-Elements DATA which i want to reject have very large values > > and i think if i reject these nodes they are also removed from > > memory. Does "be marked for recycling" mean, that these DATA nodes > > remain in memory? > > > > Mirko > > > > -------- Original-Nachricht -------- > > > >> Datum: Mon, 07 Sep 2009 09:26:05 +0200 > >> Von: Alberto Massari <amassari@...> > >> An: c-users@... > >> Betreff: Re: method startElement() from class DOMLSParserFilter > >> > > > > > >> Mirko Braun wrote: > >> > >>> Hi Alberto, > >>> > >>> thank you very much for your help. I integrated the patch in > >>> 3.0.1 and it worked. There is no exception any more. > >>> But there is still one problem. The usage of memory is still > >>> of the same size. I think if a node is rejected from the tree > >>> the usage of memory should also decrease. Is my conclusion > >>> correct? > >>> > >>> > >> Yes, if a node is rejected is should be marked for recycling; how much > >> memory are you seeing is been used? > >> > >> Alberto > >> > >> > >>> Mirko > >>> > >>> -------- Original-Nachricht -------- > >>> > >>> > >>>> Datum: Fri, 04 Sep 2009 16:12:16 +0200 > >>>> Von: Alberto Massari <amassari@...> > >>>> An: c-users@... > >>>> Betreff: Re: method startElement() from class DOMLSParserFilter > >>>> > >>>> > >>> > >>> > >>>> In effect I am seeing so many problems with that code that the only > >>>> suggestion I have is to get the latest 3.0 from the trunk and work > with > >>>> what I have just committed (or get the patch from > >>>> http://svn.apache.org/viewvc?rev=811420&view=rev and apply to the > 3.0.1 > >>>> code). This version should support your original code. > >>>> > >>>> Alberto > >>>> > >>>> > >>>> Mirko Braun wrote: > >>>> > >>>> > >>>>> Hi Alberto, > >>>>> > >>>>> yes, i'm still using the method startElement(). Is it better > >>>>> to use the method acceptNode() to reject the DATA node from > >>>>> the DOM or is there any other possibility? > >>>>> > >>>>> Mirko > >>>>> > >>>>> > >>>>> -------- Original-Nachricht -------- > >>>>> > >>>>> > >>>>> > >>>>>> Datum: Fri, 04 Sep 2009 15:41:54 +0200 > >>>>>> Von: Alberto Massari <amassari@...> > >>>>>> An: c-users@... > >>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>>> Hi Mirko, > >>>>>> are you still using startElement()? That API would mess with the > >>>>>> > >>>>>> > >>>> current > >>>> > >>>> > >>>>>> parent, so it would break the parsing at a certain point. > >>>>>> > >>>>>> Alberto > >>>>>> > >>>>>> Mirko Braun wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>> Hi Alberto, > >>>>>>> > >>>>>>> yes i'm sure that DATA is not a root node. I debugged a little > bit. > >>>>>>> The exception occurs after the sixth time this DATA node was > found. > >>>>>>> > >>>>>>> Mirko > >>>>>>> > >>>>>>> -------- Original-Nachricht -------- > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> Datum: Fri, 04 Sep 2009 14:21:15 +0200 > >>>>>>>> Von: Alberto Massari <amassari@...> > >>>>>>>> An: c-users@... > >>>>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> Hi Mirko, > >>>>>>>> are you sure that your root node isn't one of those DATA > elements? > >>>>>>>> > >> In > >> > >>>>>>>> this case the document node would see more than one root element. > >>>>>>>> > >>>>>>>> Alberto > >>>>>>>> > >>>>>>>> Mirko Braun wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> Hi Alberto, > >>>>>>>>> > >>>>>>>>> thank you for you answer. I integrated the changes you > >>>>>>>>> suggested, but the result is still the same: > >>>>>>>>> > >>>>>>>>> DOM Error during parsing: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >> > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > >> > >>>> > >>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> DOMException code is: 3 > >>>>>>>>> Message is: attempt is made to insert a node where it is not > >>>>>>>>> > >>>>>>>>> > >>>> permitted > >>>> > >>>> > >>>>>>>>> Best regards, > >>>>>>>>> Mirko > >>>>>>>>> > >>>>>>>>> -------- Original-Nachricht -------- > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200 > >>>>>>>>>> Von: Alberto Massari <amassari@...> > >>>>>>>>>> An: c-users@... > >>>>>>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> Hi Mirko, > >>>>>>>>>> I think the current implementation of the DOMLSParserFilter > >>>>>>>>>> > >> doesn't > >> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> work > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>> nicely with your code, as the rejected nodes are not recycled > and > >>>>>>>>>> > >>>>>>>>>> > >>>> the > >>>> > >>>> > >>>>>>>>>> memory will grow to the same level as before. > >>>>>>>>>> Anyhow, you should instead override acceptNode like this: > >>>>>>>>>> > >>>>>>>>>> DOMParserFilter::FilterAction > >>>>>>>>>> > >>>>>>>>>> > >>>> DOMParserFilter::acceptNode(DOMElement* > >>>> > >>>> > >>>>>>>>>> node) > >>>>>>>>>> { > >>>>>>>>>> // for element whose name is "DATA", skip it > >>>>>>>>>> if (node->getNodeType()==DOMNode::ELEMENT_NODE && > >>>>>>>>>> XMLString::compareString(node->getNodeName(), element_data)==0) > >>>>>>>>>> return DOMParserFilter::FILTER_REJECT; > >>>>>>>>>> else > >>>>>>>>>> return DOMParserFilter::FILTER_ACCEPT; > >>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>> Then, change DOMLSParserImpl::endElement to add a call to > >>>>>>>>>> origNode->release() after the call to removeChild(). > >>>>>>>>>> > >>>>>>>>>> Alberto > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Mirko Braun wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> Hello everybody, > >>>>>>>>>>> > >>>>>>>>>>> i would like to parse a quite large XML file (about 180 MB). > >>>>>>>>>>> I used the DOM interface because i need the tree for further > >>>>>>>>>>> processing of the data the xml file contains. Of course there > >>>>>>>>>>> is a lot of memory used during parsing the file and i got an > >>>>>>>>>>> "Out of memory" exception. > >>>>>>>>>>> > >>>>>>>>>>> I noticed that a class DOMLSParserFilter comes along wiht > >>>>>>>>>>> > >> Xercesc > >> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>> C++ > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> 3.0.1 (Win32), which makes it possible to filter the Nodes > during > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> parsing. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> That is perfect for me because one XML-Element in my large > file > >>>>>>>>>>> contains most of the data. This XML-Element is called DATA and > >>>>>>>>>>> appears serveral time in my XML file. > >>>>>>>>>>> So i had the idea to reject this XML-Element from the DOM tree > >>>>>>>>>>> during parsing to reduce the used memory by using the method > >>>>>>>>>>> startElement() of the DOMLSParserFilter class. After that i > >>>>>>>>>>> > >> would > >> > >>>>>>>>>>> use a SAX parser and just get all XML-Elements DATA with their > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>> values. > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>>>> But it does not work. > >>>>>>>>>>> I integregated my code into the DOMPrint example which comes > >>>>>>>>>>> > >> along > >> > >>>>>>>>>>> with Xercesc C++ 3.0.1. The following error message occurred: > >>>>>>>>>>> > >>>>>>>>>>> DOM Error during parsing: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >> > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > >> > >>>> > >>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> DOMException code is: 3 > >>>>>>>>>>> Message is: attempt is made to insert a node where it is not > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>> permitted > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>>>> Did i misunderstand the functionality of the DOMLSParserFilter > >>>>>>>>>>> > >>>>>>>>>>> > >>>> class > >>>> > >>>> > >>>>>>>>>>> and its method startElement? > >>>>>>>>>>> It is possible to realize my idea with the help of this class? > >>>>>>>>>>> > >> Did > >> > >>>>>>>>>>> i something wrong with in my code (please have a look below)? > >>>>>>>>>>> > >>>>>>>>>>> I would be very grateful for any help. > >>>>>>>>>>> > >>>>>>>>>>> Thanks in advanced, > >>>>>>>>>>> Mirko > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> DOMPrintFilter.hpp: > >>>>>>>>>>> -------------------- > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> class DOMParserFilter : public DOMLSParserFilter { > >>>>>>>>>>> public: > >>>>>>>>>>> > >>>>>>>>>>> DOMParserFilter(DOMNodeFilter::ShowType whatToShow = > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> DOMNodeFilter::SHOW_ALL); > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> ~DOMParserFilter(){}; > >>>>>>>>>>> > >>>>>>>>>>> virtual FilterAction startElement(DOMElement* node); > >>>>>>>>>>> virtual FilterAction acceptNode(DOMNode* node){return > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> DOMParserFilter::FILTER_ACCEPT;}; > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> virtual DOMNodeFilter::ShowType getWhatToShow() const > >>>>>>>>>>> > >> {return > >> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> fWhatToShow;}; > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> private: > >>>>>>>>>>> DOMNodeFilter::ShowType fWhatToShow; > >>>>>>>>>>> }; > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> DOMPrintFilter.cpp: > >>>>>>>>>>> -------------------- > >>>>>>>>>>> > >>>>>>>>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType > >>>>>>>>>>> > >>>>>>>>>>> > >>>> whatToShow) > >>>> > >>>> > >>>>>>>>>>> :fWhatToShow(whatToShow) > >>>>>>>>>>> {} > >>>>>>>>>>> > >>>>>>>>>>> DOMParserFilter::FilterAction > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> DOMParserFilter::startElement(DOMElement* > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> node) > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> { > >>>>>>>>>>> // for element whose name is "DATA", skip it > >>>>>>>>>>> if (XMLString::compareString(node->getNodeName(), > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>> element_data)==0) > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>>>> return DOMParserFilter::FILTER_REJECT; > >>>>>>>>>>> else > >>>>>>>>>>> return DOMParserFilter::FILTER_ACCEPT; > >>>>>>>>>>> } > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> DOMPrint.cpp: > >>>>>>>>>>> --------------- > >>>>>>>>>>> > >>>>>>>>>>> static const XMLCh gLS[] = { xercesc::chLatin_L, > >>>>>>>>>>> > >>>>>>>>>>> > >>>> xercesc::chLatin_S, > >>>> > >>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> xercesc::chNull }; > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> xercesc::DOMImplementation *implParser = > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS); > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> xercesc::DOMLSParser* parser = > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >> > ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0); > >> > >>>> > >>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> DOMTreeErrorReporter *errReporter = new > DOMTreeErrorReporter(); > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >> > parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler, > >> > >>>> > >>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> errReporter); > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter(); > >>>>>>>>>>> parser->setFilter(pDOMParserFilter); > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> // > >>>>>>>>>>> // Parse the XML file, catching any XML exceptions that > >>>>>>>>>>> > >> might > >> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> propogate > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> // out of it. > >>>>>>>>>>> // > >>>>>>>>>>> bool errorsOccured = false; > >>>>>>>>>>> DOMDocument *doc = NULL; > >>>>>>>>>>> > >>>>>>>>>>> try > >>>>>>>>>>> { > >>>>>>>>>>> doc = parser->parseURI(gXmlFile); > >>>>>>>>>>> } > >>>>>>>>>>> catch (const OutOfMemoryException&) > >>>>>>>>>>> { > >>>>>>>>>>> XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> XERCES_STD_QUALIFIER endl; > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> errorsOccured = true; > >>>>>>>>>>> } > >>>>>>>>>>> catch (const XMLException& e) > >>>>>>>>>>> { > >>>>>>>>>>> XERCES_STD_QUALIFIER cerr << "An error occurred during > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> parsing\n > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> Message: " > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> << StrX(e.getMessage()) << XERCES_STD_QUALIFIER > >>>>>>>>>>> > >> endl; > >> > >>>>>>>>>>> errorsOccured = true; > >>>>>>>>>>> } > >>>>>>>>>>> > >>>>>>>>>>> catch (const DOMException& e) > >>>>>>>>>>> { > >>>>>>>>>>> const unsigned int maxChars = 2047; > >>>>>>>>>>> XMLCh errText[maxChars + 1]; > >>>>>>>>>>> > >>>>>>>>>>> XERCES_STD_QUALIFIER cerr << "\nDOM Error during > parsing: > >>>>>>>>>>> > >> '" > >> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>> << > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> gXmlFile << "'\n" > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> << "DOMException code is: " << e.code << > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> XERCES_STD_QUALIFIER endl; > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> if (DOMImplementation::loadDOMExceptionMsg(e.code, > >>>>>>>>>>> > >> errText, > >> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> maxChars)) > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> XERCES_STD_QUALIFIER cerr << "Message is: " << > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> StrX(errText) > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> << XERCES_STD_QUALIFIER endl; > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> errorsOccured = true; > >>>>>>>>>>> } > >>>>>>>>>>> > >>>>>>>>>>> catch (...) > >>>>>>>>>>> { > >>>>>>>>>>> XERCES_STD_QUALIFIER cerr << "An error occurred during > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> parsing\n > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> " << XERCES_STD_QUALIFIER endl; > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> errorsOccured = true; > >>>>>>>>>>> } > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>>>> > >>>>> > >>> > >>> > > > > |
|
|
Re: method startElement() from class DOMLSParserFilterHi Mirko,
sorry for the late answer; the DOM document is reusing that text fragment, but it doesn't try to use it for a similarly sized string. So, it gets reused immediately, maybe to store just a couple of characters (and that doesn't help reducing the memory footprint). Alberto Mirko Braun wrote: > Hi Alberto, > > did you have the time to check "if node texts do the same > with the buffer used to keep the node value, and how they are recycled > (i.e. if the big buffer used by DATA nodes is reused for a much smaller > node)"? > > Mirko > > -------- Original-Nachricht -------- > >> Datum: Tue, 08 Sep 2009 09:37:52 +0200 >> Von: Alberto Massari <amassari@...> >> An: c-users@... >> Betreff: Re: method startElement() from class DOMLSParserFilter >> > > >> When you call release() on a node, the node is not deleted (as its >> memory comes from a pool that can be deleted as a whole) but it's placed >> in a "recycle bin" from where it is taken when a new node of the same >> type is requested. So, the next element will not allocate extra memory, >> but reuse that node. What I need to check is if node texts do the same >> with the buffer used to keep the node value, and how they are recycled >> (i.e. if the big buffer used by DATA nodes is reused for a much smaller >> node) >> >> Alberto >> >> Mirko Braun wrote: >> >>> Sorry, I don't know how much memory is used. I just had a look at the >>> maximum used memory in the task manager (Window XP). It doesn't >>> matter if i used a DOMLSParserFilter or not the process DOMPrint.exe >>> >> used the same size of memory. >> >>> The XML-Elements DATA which i want to reject have very large values >>> and i think if i reject these nodes they are also removed from >>> memory. Does "be marked for recycling" mean, that these DATA nodes >>> remain in memory? >>> >>> Mirko >>> >>> -------- Original-Nachricht -------- >>> >>> >>>> Datum: Mon, 07 Sep 2009 09:26:05 +0200 >>>> Von: Alberto Massari <amassari@...> >>>> An: c-users@... >>>> Betreff: Re: method startElement() from class DOMLSParserFilter >>>> >>>> >>> >>> >>>> Mirko Braun wrote: >>>> >>>> >>>>> Hi Alberto, >>>>> >>>>> thank you very much for your help. I integrated the patch in >>>>> 3.0.1 and it worked. There is no exception any more. >>>>> But there is still one problem. The usage of memory is still >>>>> of the same size. I think if a node is rejected from the tree >>>>> the usage of memory should also decrease. Is my conclusion >>>>> correct? >>>>> >>>>> >>>>> >>>> Yes, if a node is rejected is should be marked for recycling; how much >>>> memory are you seeing is been used? >>>> >>>> Alberto >>>> |
|
|
RE: method startElement() from class DOMLSParserFilterSome suggestions... if you do not require the DOM itself, you might use the SAX parser interface. It is not really much harder than the DOM interface although it takes some getting used to the method-callback mechanism. Alternatively, if it is OK to use the memory temporarily, you could deep-copy the filtered DOM to a new DOM and discard the original.
john -----Original Message----- From: Alberto Massari [mailto:amassari@...] Sent: Tuesday, September 22, 2009 12:41 PM To: c-users@... Subject: Re: method startElement() from class DOMLSParserFilter Hi Mirko, sorry for the late answer; the DOM document is reusing that text fragment, but it doesn't try to use it for a similarly sized string. So, it gets reused immediately, maybe to store just a couple of characters (and that doesn't help reducing the memory footprint). Alberto Mirko Braun wrote: > Hi Alberto, > > did you have the time to check "if node texts do the same > with the buffer used to keep the node value, and how they are recycled > (i.e. if the big buffer used by DATA nodes is reused for a much smaller > node)"? > > Mirko > > -------- Original-Nachricht -------- > >> Datum: Tue, 08 Sep 2009 09:37:52 +0200 >> Von: Alberto Massari <amassari@...> >> An: c-users@... >> Betreff: Re: method startElement() from class DOMLSParserFilter >> > > >> When you call release() on a node, the node is not deleted (as its >> memory comes from a pool that can be deleted as a whole) but it's placed >> in a "recycle bin" from where it is taken when a new node of the same >> type is requested. So, the next element will not allocate extra memory, >> but reuse that node. What I need to check is if node texts do the same >> with the buffer used to keep the node value, and how they are recycled >> (i.e. if the big buffer used by DATA nodes is reused for a much smaller >> node) >> >> Alberto >> >> Mirko Braun wrote: >> >>> Sorry, I don't know how much memory is used. I just had a look at the >>> maximum used memory in the task manager (Window XP). It doesn't >>> matter if i used a DOMLSParserFilter or not the process DOMPrint.exe >>> >> used the same size of memory. >> >>> The XML-Elements DATA which i want to reject have very large values >>> and i think if i reject these nodes they are also removed from >>> memory. Does "be marked for recycling" mean, that these DATA nodes >>> remain in memory? >>> >>> Mirko >>> >>> -------- Original-Nachricht -------- >>> >>> >>>> Datum: Mon, 07 Sep 2009 09:26:05 +0200 >>>> Von: Alberto Massari <amassari@...> >>>> An: c-users@... >>>> Betreff: Re: method startElement() from class DOMLSParserFilter >>>> >>>> >>> >>> >>>> Mirko Braun wrote: >>>> >>>> >>>>> Hi Alberto, >>>>> >>>>> thank you very much for your help. I integrated the patch in >>>>> 3.0.1 and it worked. There is no exception any more. >>>>> But there is still one problem. The usage of memory is still >>>>> of the same size. I think if a node is rejected from the tree >>>>> the usage of memory should also decrease. Is my conclusion >>>>> correct? >>>>> >>>>> >>>>> >>>> Yes, if a node is rejected is should be marked for recycling; how much >>>> memory are you seeing is been used? >>>> >>>> Alberto >>>> |
|
|
How to get Xerces to recognize external entity calloutsHello guys
I'm getting Xerces parse errors and I believe it is because the entity callouts can not be located(see below). How do I get Xerces to follow the URL in the entity callout to resolve this. I'm running Xerces 2.7.0 and not sure if this feature is supported. Do I need to upgrade Xerces? Snippet of the xml file -------------------------------------------------------------------------------------------------------- <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE mpd SYSTEM "mpboe03.dtd" [ <!ENTITY % isobox PUBLIC "-//W3C//ENTITIES Box and Line Drawing//EN//XML" "http://www.w3.org/2003/entities/2007/isobox.ent" > %isobox; <!ENTITY % isoamsc PUBLIC "-//W3C//ENTITIES Added Math Symbols: Delimiters//EN//XML" "http://www.w3.org/2003/entities/2007/isoamsc.ent" > %isoamsc; DeWayne Dantlzer |
|
|
Re: How to get Xerces to recognize external entity calloutsHi,
how are you invoking the parsing? Maybe you disabled external enitity resolution, or you didn't compile a NetAccessor inside Xerces. Alberto Dantzler, DeWayne C wrote: > Hello guys > > I'm getting Xerces parse errors and I believe it is because the entity callouts can not be located(see below). How do I get Xerces to follow the URL in the entity callout to resolve this. I'm running Xerces 2.7.0 and not sure if this feature is supported. Do I need to upgrade Xerces? > > > Snippet of the xml file > -------------------------------------------------------------------------------------------------------- > <?xml version="1.0" encoding="UTF-8"?> > <!DOCTYPE mpd SYSTEM "mpboe03.dtd" [ > <!ENTITY % isobox PUBLIC "-//W3C//ENTITIES Box and Line Drawing//EN//XML" "http://www.w3.org/2003/entities/2007/isobox.ent" > > %isobox; > <!ENTITY % isoamsc PUBLIC "-//W3C//ENTITIES Added Math Symbols: Delimiters//EN//XML" "http://www.w3.org/2003/entities/2007/isoamsc.ent" > > %isoamsc; > > > DeWayne Dantlzer > > > |
| < Prev | 1 - 2 - 3 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |