On Wed, Jul 8, 2009 at 09:24, Saurabh Suman <
saurabhsuman289@...>wrote:
>
> hi
> I want to parse feedUrl using nutch.i tried to use
> org.apache.nutch.parse.feed.FeedParser class. Its input is xml. I put in
> xml
> the link below.
>
http://timesofindia.indiatimes.com/rssfeedsdefault.cms> This url contains all rss feeds for newspaper.When i tried to use it
> through
> Rome Feed Parser it was giving me all the permalink, title,date etc. But
> nutch parser doesnot give anything.
> How can i get all the permalink,title,date in this url.
>
In conf/parse-plugins.xml:
<mimeType name="text/xml">
<plugin id="parse-html" />
<plugin id="parse-rss" />
<plugin id="feed" />
</mimeType>
The URL you mentioned has a text/xml content-type. And since you probably
also have
parse-html defined in your conf file, parse-html tries to parse the feeds.
Try moving "feed" plugin higher so :
<mimeType name="text/xml">
<plugin id="feed" />
<plugin id="parse-html" />
<plugin id="parse-rss" />
</mimeType>
>
> --
> View this message in context:
>
http://www.nabble.com/How-to-Parse-Rss-Feed-URL-tp24386051p24386051.html> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>
--
Doğacan Güney