Help parsing http://feeds.guardian.co.uk/theguardian/rss

View: New views
5 Messages — Rating Filter:   Alert me  

Help parsing http://feeds.guardian.co.uk/theguardian/rss

by herceg_novi :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello, I am using rome 1.0 to process Guardian rss link:
http://feeds.guardian.co.uk/theguardian/rss

Unfortunately I am getting the following error:

Error parsing: http://feeds.guardian.co.uk/theguardian/rss: failed(2,200): com.sun.syndication.io.ParsingFeedException: Invalid XML: Error on line 207: XML document structures must start and end within the same entity.

Is there anything I can do to make rome process this rss?

Thanks!

Re: Help parsing http://feeds.guardian.co.uk/theguardian/rss

by Tilman Bender-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hmm,

W3C Validator shows this feed as not compliant to spec:
http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Ffeeds.guardian.co.uk%2Ftheguardian%2Frss

Maybe the issue is related.

Tilman Bender
Student des Software Engineering
Hochschule Heilbronn
tbender@...



Am 05.11.2009 um 16:12 schrieb herceg_novi:

>
> Hello, I am using rome 1.0 to process Guardian rss link:
> http://feeds.guardian.co.uk/theguardian/rss
>
> Unfortunately I am getting the following error:
>
> Error parsing: http://feeds.guardian.co.uk/theguardian/rss: failed
> (2,200):
> com.sun.syndication.io.ParsingFeedException: Invalid XML: Error on  
> line 207:
> XML document structures must start and end within the same entity.
>
> Is there anything I can do to make rome process this rss?
>
> Thanks!
> --
> View this message in context: http://old.nabble.com/Help-parsing-http%3A--feeds.guardian.co.uk-theguardian-rss-tp26215747p26215747.html
> Sent from the Rome - Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@...
> For additional commands, e-mail: users-help@...
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: Help parsing http://feeds.guardian.co.uk/theguardian/rss

by Tilman Bender-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Can you provide some code on how you are actually downloading the feed  
document?
I just tested it with a tool based on rome fetcher and it seems to  
parse.
When exactly does your error occur?

Tilman Bender
Student des Software Engineering
Hochschule Heilbronn
tbender@...



Am 05.11.2009 um 16:12 schrieb herceg_novi:

>
> Hello, I am using rome 1.0 to process Guardian rss link:
> http://feeds.guardian.co.uk/theguardian/rss
>
> Unfortunately I am getting the following error:
>
> Error parsing: http://feeds.guardian.co.uk/theguardian/rss: failed
> (2,200):
> com.sun.syndication.io.ParsingFeedException: Invalid XML: Error on  
> line 207:
> XML document structures must start and end within the same entity.
>
> Is there anything I can do to make rome process this rss?
>
> Thanks!
> --
> View this message in context: http://old.nabble.com/Help-parsing-http%3A--feeds.guardian.co.uk-theguardian-rss-tp26215747p26215747.html
> Sent from the Rome - Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@...
> For additional commands, e-mail: users-help@...
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: Help parsing http://feeds.guardian.co.uk/theguardian/rss

by Martin Kurz :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

the error message you provided sounds like the feed wasn't valid xml
temporarily, at least when I'm testing the feed now with the following
code fragment, there's no problem.

SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build( new XmlReader( new URL(
"http://feeds.guardian.co.uk/theguardian/rss" ) ) );

sincerely,

martin

herceg_novi schrieb:

> Hello, I am using rome 1.0 to process Guardian rss link:
> http://feeds.guardian.co.uk/theguardian/rss
>
> Unfortunately I am getting the following error:
>
> Error parsing: http://feeds.guardian.co.uk/theguardian/rss: failed(2,200):
> com.sun.syndication.io.ParsingFeedException: Invalid XML: Error on line 207:
> XML document structures must start and end within the same entity.
>
> Is there anything I can do to make rome process this rss?
>
> Thanks!

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: Help parsing http://feeds.guardian.co.uk/theguardian/rss

by herceg_novi :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks. The parsing was actually done from Nutch, using the Rome library. After further investigation, it turns out that there was a nutch property that limits the size of the downloaded file. I changed that property to unlimited (set http.content.limit to -1 in nutch-site.xml), and the parsing now works well. Thanks everyone for your help!