|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
a strange Encoding issue?Hi everyone,
I want to use Rome to get news entries from following URL: http://www.trainingpressreleases.co.uk/rss.ashx?incCat=1 but got exception: Invalid XML: Error on line 1: Content is not allowed in prolog. That rss works fine with firefox, IE and some rss software I test. I tried to print out html of by connection, content looks strange: ?< ? x m l v e r s i o n = ..... How can I use Rome to parse URL like that, is it an encoding issue? Thanks ian |
|
|
Re: a strange Encoding issue?Hi Ian,
what version of rome are you using and how are you reading the feed? The problem is encoding related, the feed is utf-16, this is a double byte charset and the files first two bytes are marking the utf-16 version (big endian or little endian, the so called "byte order mark" or "bom" shortly). So when trying to read the feed, the parser seems not to recognize the utf-16 encoding and so the parser is seeing some bytes before the starting xml declaration and that's not allowed. I made a simple test case: try { URL feedUrl = new URL( "http://www.trainingpressreleases.co.uk/rss.ashx?incCat=1" ); SyndFeedInput input = new SyndFeedInput(); XmlReader xr = new XmlReader( feedUrl.openStream() ); System.out.println( "Encoding " + xr.getEncoding() ); SyndFeed feed = input.build( xr ); feed.setEncoding( "UTF-8" ); PrintWriter pw = new PrintWriter( System.out ); SyndFeedOutput output = new SyndFeedOutput(); output.output( feed, pw, true ); pw.flush(); } catch ( Exception ex ) { ex.printStackTrace(); } I can parse the feed an convert it to utf-8 for output without any problem with rome (tested with rome 1.0). Could you validate you can parse and output the feed with the code above? Greetings, Martin ianwong schrieb: > Hi everyone, > > I want to use Rome to get news entries from following URL: > http://www.trainingpressreleases.co.uk/rss.ashx?incCat=1 > but got exception: Invalid XML: Error on line 1: Content is not allowed in > prolog. > > That rss works fine with firefox, IE and some rss software I test. > > I tried to print out html of by connection, content looks strange: > ?< ? x m l v e r s i o n = ..... > > How can I use Rome to parse URL like that, is it an encoding issue? > > Thanks > > ian > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: a strange Encoding issue?I am quite certain what you are seeing is a UTF-16 XML file that is being declared as UTF-8 in the HTTP header.
Martin is definitely correct. The big thing is make sure you are using the XmlReader, or RomeFetcher when you are parsing. There is a good bit of dark magic in there to deal with everyone's broken content types on the internet.
On Tue, Apr 14, 2009 at 4:36 PM, Martin Kurz <info@...> wrote: Hi Ian, -- :Robert "kebernet" Cooper ::kebernet@... Alice's cleartext Charlie is the attacker Bob signs and encrypts http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x9E8759F8 |
|
|
Re: a strange Encoding issue?Thanks for the help, Martin
I am using Rome1.0. Your explanation is really helpful. Ian
|
| Free embeddable forum powered by Nabble | Forum Help |