RSS autodetect

View: New views
4 Messages — Rating Filter:   Alert me  

RSS autodetect

by richiebabes :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Before I look into ROME as a possible solution does the feed source have to be a .rss web page or does it autodetect based on the URL given? I need something that can autodetect the rss feed!!

Re: RSS autodetect

by Martin Kurz :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I don't exactly understand your question, rome doesn't expects any
special URL scheme, so the feed you're reading can also end in .html or
.feed or whatever, as long as it's an (more or less) valid feed
(rss/rdf/atom), you should be able to read it with Rome. Rome doesn't
care about the URL given but about the content delivered from the URL given.

Greetings,

Martin

richiebabes schrieb:
> Before I look into ROME as a possible solution does the feed source have to
> be a .rss web page or does it autodetect based on the URL given? I need
> something that can autodetect the rss feed!!

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@...
For additional commands, e-mail: dev-help@...


Re: RSS autodetect

by Joseph Ottinger :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

ROME doesn't autodetect RSS feed urls. Jakarta's feedparser had code to do it, but I haven't been able to find feedparser's code anywhere.

On Fri, Jul 3, 2009 at 1:39 PM, richiebabes <rich.g.morgan@...> wrote:

Before I look into ROME as a possible solution does the feed source have to
be a .rss web page or does it autodetect based on the URL given? I need
something that can autodetect the rss feed!!
--
View this message in context: http://www.nabble.com/RSS-autodetect-tp24326590p24326590.html
Sent from the Rome - Development mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@...
For additional commands, e-mail: dev-help@...




--
Joseph B. Ottinger
http://enigmastation.com

RE: RSS autodetect

by Nick Lothian :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.

Here’s some quick & dirty code to do this. You’ll need tagsoup:

 

       public static String findRssFeedOnWebpage(String url, int timeoutMillis) {

              String rssFeedUrl = null;

              try {

                     //Make a request to the web page looking for

                     // <link ref="alternate" type="application/rss+xml" href="wherever web site is" />

                     GetMethod getMethod = new GetMethod(url);

                     getMethod.setRequestHeader("Accept", "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");

                     getMethod.setRequestHeader("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");

                    

                     HttpClient httpClient = new HttpClient();

                     httpClient.getHttpConnectionManager().getParams().setConnectionTimeout(timeoutMillis);

                     httpClient.getHttpConnectionManager().getParams().setSoTimeout(timeoutMillis);

                    

                     if (httpClient.executeMethod(getMethod) > 0) {

                           final SAXException STOP_PARSING = new SAXException("Found rss feed, terminating parsing");

                          

                           final String[] hrefBox = new String[1];

                           DefaultHandler rssFinder = new DefaultHandler() {

                                  @Override

                                  public void startElement(

                                                String uri,

                                                String localName,

                                                String name,

                                                Attributes attributes)

                                                throws SAXException {

                                         if ("link".equals(localName) &&

                                                ("application/rss+xml".equals(attributes.getValue("type")) ||

                                                 "application/atom+xml".equals(attributes.getValue("type"))) &&

                                                "alternate".equals(attributes.getValue("rel"))) {

                                               

                                                hrefBox[0] = attributes.getValue("href");

                                                throw STOP_PARSING;

                                         }

                                  }

                                  // tag has to be in head, so once we're done looking in head stop processing

                                  @Override

                                  public void endElement(

                                                String uri,

                                                String localName,

                                                String name) throws SAXException {

                                         if ("head".equals(localName)) {

                                                throw STOP_PARSING;

                                         }

                                  }

                           };

 

                           SAXParser sp = SAXParserFactoryImpl.newInstance("org.ccil.cowan.tagsoup.jaxp.SAXFactoryImpl",Thread.currentThread().getContextClassLoader()).newSAXParser();

                           InputStream is = new BufferedInputStream(getMethod.getResponseBodyAsStream());

                           try {

                                  sp.parse(is, rssFinder);

                           } catch (SAXException e) {

                                  if (e != STOP_PARSING) { //STOP_PARSING is just an optimisation, nothing to worry about

                                         throw e;

                                  }

                           } finally {

                                  is.close();

                           }

                           if (hrefBox[0] != null) {

                                  rssFeedUrl = hrefBox[0];

                                  LOG.debug("Found RSS element found in " + url + " of " + rssFeedUrl);

                           } else {

                                  LOG.debug("No RSS element found in " + url);

                           }

                     }

              } catch (Exception e) {

                     LOG.warn("Error when trying to derive RSS feed from " + url, e);

              }

              return rssFeedUrl;

       }     

 

 

From: dreamreal@... [mailto:dreamreal@...] On Behalf Of Joseph Ottinger
Sent: Sunday, 5 July 2009 9:35 PM
To: dev@...
Subject: Re: RSS autodetect

 

ROME doesn't autodetect RSS feed urls. Jakarta's feedparser had code to do it, but I haven't been able to find feedparser's code anywhere.

On Fri, Jul 3, 2009 at 1:39 PM, richiebabes <rich.g.morgan@...> wrote:


Before I look into ROME as a possible solution does the feed source have to
be a .rss web page or does it autodetect based on the URL given? I need
something that can autodetect the rss feed!!
--
View this message in context: http://www.nabble.com/RSS-autodetect-tp24326590p24326590.html
Sent from the Rome - Development mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@...
For additional commands, e-mail: dev-help@...




--
Joseph B. Ottinger
http://enigmastation.com



IMPORTANT: This e-mail, including any attachments, may contain private or confidential information. If you think you may not be the intended recipient, or if you have received this e-mail in error, please contact the sender immediately and delete all copies of this e-mail. If you are not the intended recipient, you must not reproduce any part of this e-mail or disclose its contents to any other party. This email represents the views of the individual sender, which do not necessarily reflect those of Education.au except where the sender expressly states otherwise. It is your responsibility to scan this email and any files transmitted with it for viruses or any other defects. education.au limited will not be liable for any loss, damage or consequence caused directly or indirectly by this email.