|
View:
New views
3 Messages
—
Rating Filter:
Alert me
|
|
|
parsing ill-formed rssI'm using ROME to parse generated RSS feeds, but of course many RSS
feeds are ill-formed XML and cause conforming XML parsers to fail. (An example of the actual ill-formed RSS feed I'm getting is found below.) Does anyone have experience using a "loose parser" that is forgiving to ill-formed XML. I know the kind of "loose rules" I would like to apply but I don't want to have to implement my own parser or stream filter -- I'd prefer to use a loose parser that lets me hook in some specific behavior. Anyone done this before? Part of the feed I'm parsing looks like this: ... <item> <title>D;< ugggh.. [cousin's idiotic friends] stupid!!! >;[[[</title> <link>http://twitter.com/Santaysiaaa/statuses/1920295596</link> <description><![CDATA[ ]]></description> <pubDate>Tue, 26 May 2009 04:28:33 +0000</pubDate> <guid>http://twitter.com/Santaysiaaa/statuses/1920295596</guid> </item> ... Thanks! --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: parsing ill-formed rssHave you tried Tag Soup? <http://home.ccil.org/~cowan/XML/tagsoup/>
On Tue, Jun 2, 2009 at 5:15 PM, Aaron Dixon <atdixon@...> wrote: I'm using ROME to parse generated RSS feeds, but of course many RSS -- Never did I see a second sun Never did my skin touch a land of glass Never did my rifle point but true But in a land empty of enemies Waiting for the tick-tick-tick of the want A uranium angel Crying “behold,” This land that knew fire is yours Taken from Corruption To begin anew |
|
|
Re: parsing ill-formed rssI tried the tag soup parser but I got the same issues.
I wrote an RssFixerReader that allows me to register tag names that I expect might have bad data. I wrap the Rss input reader with this fixer whenever I am parsing the feed that I expect to be ill-formed. It's a pretty simple state machine but fairly tailored to my problem. On Tue, Jun 2, 2009 at 4:21 PM, Charles HOPE <lookslikeiwasright@...> wrote: > Have you tried Tag Soup? <http://home.ccil.org/~cowan/XML/tagsoup/> > > > > On Tue, Jun 2, 2009 at 5:15 PM, Aaron Dixon <atdixon@...> wrote: >> >> I'm using ROME to parse generated RSS feeds, but of course many RSS >> feeds are ill-formed XML and cause conforming XML parsers to fail. (An >> example of the actual ill-formed RSS feed I'm getting is found below.) >> >> Does anyone have experience using a "loose parser" that is forgiving >> to ill-formed XML. I know the kind of "loose rules" I would like to >> apply but I don't want to have to implement my own parser or stream >> filter -- I'd prefer to use a loose parser that lets me hook in some >> specific behavior. Anyone done this before? >> >> Part of the feed I'm parsing looks like this: >> ... >> <item> >> <title>D;< ugggh.. [cousin's idiotic friends] stupid!!! >;[[[</title> >> <link>http://twitter.com/Santaysiaaa/statuses/1920295596</link> >> <description><![CDATA[ ]]></description> >> <pubDate>Tue, 26 May 2009 04:28:33 +0000</pubDate> >> <guid>http://twitter.com/Santaysiaaa/statuses/1920295596</guid> >> </item> >> ... >> >> Thanks! >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: users-unsubscribe@... >> For additional commands, e-mail: users-help@... >> > > > > -- > Never did I see a second sun > Never did my skin touch a land of glass > Never did my rifle point but true > But in a land empty of enemies > Waiting for the tick-tick-tick of the want > A uranium angel > Crying “behold,” > This land that knew fire is yours > Taken from Corruption > To begin anew > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
| Free embeddable forum powered by Nabble | Forum Help |