« Return to Thread: How to Parse Rss Feed URL

Re: How to Parse Rss Feed URL

by Saurabh Suman :: Rate this Message:

Reply to Author | View in Thread

When I org.apache.nutch.parse.rss.RSSParser , its working fine.Now I am getting URLs.Now i want to get content. How will i do this? Do i need to send to all URLs to crawldb.Then run the crawl command,or there is another way.

hi
I want to parse feedUrl using nutch.i tried to use org.apache.nutch.parse.feed.FeedParser class. Its input is xml. I put in xml the link below.
http://timesofindia.indiatimes.com/rssfeedsdefault.cms
This url contains all rss feeds for newspaper.When i tried to use it through Rome Feed Parser it was giving me all the permalink, title,date etc. But nutch parser doesnot give anything.
How can i get all the permalink,title,date in this url.


 « Return to Thread: How to Parse Rss Feed URL