|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
Parsing HTML documents?Hello.
I have been looking for a C library that provides a DOM interface to parsed HTML documents, however I have been struggling to make it work the way that I'd like (probably because I'm trying to use it incorrectly, no doubt!). Firstly, can gdome be used to parse HTML documents? I am aware that its more geared towards XML, which, although similar, has obvious differences! In any case, I'm using one of the examples as a start, however I'm getting this error while calling parse(): parser error : StartTag: invalid element name <!doctype html><head><title> I guess the main question I have, is am I using the right tool? Or should I be using something more suited to HTML? If so, would any body have any recomendations? I need to be able to modify various components of the DOM, and it needs to be written in C (or C++, but preferably C). Many thanks -- Bradley Kite _______________________________________________ gdome mailing list gdome@... http://mail.gnome.org/mailman/listinfo/gdome |
|
|
Re: Parsing HTML documents?Hello Bradley,
On Sun, Aug 16, 2009 at 10:59 PM, Bradley Kite<bradley.kite@...> wrote: > I guess the main question I have, is am I using the right tool? Or > should I be using something more suited to HTML? If so, would any body > have any recomendations? I need to be able to modify various > components of the DOM, and it needs to be written in C (or C++, but > preferably C). I don't remember what's the current status of HTML support in Gdome2. Anyway, I'd recommend to have a look at libxml2: http://xmlsoft.org/ it is the engine that underlies Gdome2 (which is just a wrapper on top of it). For sure it does support HTML parsing. Its API is not exactly the DOM one (which is why Gdome2 exists) but it gets very close. And it's written in plain C. Best regards, --luca _______________________________________________ gdome mailing list gdome@... http://mail.gnome.org/mailman/listinfo/gdome |
|
|
Re: Parsing HTML documents?Hello Bradley, gdome2 works only with well formed XML documents, the only released modules (wrt DOM specifications) are Main, Events and XPath: the HTML module is an old, not working and not maintained implementation. If you deal only with well formed XHTML documents, you can use gdome2, otherwise, as Luca already said, libxml2 is the right choice for you. Best regards, On Aug 16, 2009 10:59 PM, "Bradley Kite" <bradley.kite@...> wrote: _______________________________________________ gdome mailing list gdome@... http://mail.gnome.org/mailman/listinfo/gdome |
|
|
Re: Parsing HTML documents?2009/8/17 Paolo Casarini <paolo@...>:
> Hello Bradley, > > gdome2 works only with well formed XML documents, the only released > modules (wrt DOM specifications) are Main, Events and XPath: the HTML module > is an old, not working and not maintained implementation. > > If you deal only with well formed XHTML documents, you can use gdome2, > otherwise, as Luca already said, libxml2 is the right choice for you. > > Best regards, > Paolo. > > On Aug 16, 2009 10:59 PM, "Bradley Kite" <bradley.kite@...> wrote: > > Hello. > > I have been looking for a C library that provides a DOM interface to > parsed HTML documents, however I have been struggling to make it work > the way that I'd like (probably because I'm trying to use it > incorrectly, no doubt!). > > Firstly, can gdome be used to parse HTML documents? I am aware that > its more geared towards XML, which, although similar, has obvious > differences! > > In any case, I'm using one of the examples as a start, however I'm > getting this error while calling parse(): > > parser error : StartTag: invalid element name > <!doctype html><head><title> > > I guess the main question I have, is am I using the right tool? Or > should I be using something more suited to HTML? If so, would any body > have any recomendations? I need to be able to modify various > components of the DOM, and it needs to be written in C (or C++, but > preferably C). > > Many thanks > -- > Bradley Kite > _______________________________________________ > gdome mailing list > gdome@... > http://mail.gnome.org/mailman/listinfo/gdome > Hi Luca and Paolo Many thanks for your responses. I have started using libxml2 and found it to be exactly what I require! Kind Regards -- Brad. _______________________________________________ gdome mailing list gdome@... http://mail.gnome.org/mailman/listinfo/gdome |
| Free embeddable forum powered by Nabble | Forum Help |