Input '&' symbol causing SystemID Unknown error

View: New views
5 Messages — Rating Filter:   Alert me  

Input '&' symbol causing SystemID Unknown error

by Oscar Usifer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Folks,

My input XML <entry></entry> values are containing '&' input symbols, e.g. "...<entry>Texas A&M</entry>", which results in errors as follows. I am reading the XML input from a SQL database, and thus appear as input CLOBs, so no specification is being set on the input charset. Any ideas how to resolve?

Thanks,
OSC

Reported Error:

  SystemId Unknown; Line #1; Column #5383; The reference to entity "M" must end with the ';' delimiter.

Source Code:

public class ChatEncoder {
...
   public static void encodeHTML(Clob clob, Writer out)
      throws SQLException, IOException, TransformerException, TransformerConfigurationException
   {
      TransformerFactory tFactory = TransformerFactory.newInstance();
      StringReader sr = new StringReader(XSLT_RAW);
      StreamSource ss = new StreamSource(sr);
      Transformer transformer = tFactory.newTransformer(ss);
     
      transformer.transform(
               new StreamSource(getClobReader(clob)),
               new StreamResult(out));
   }

...
}

--
An Excellent Credit Score is 750
See Yours in Just 2 Easy Steps!


Re: Input '&' symbol causing SystemID Unknown error

by Michael Ludwig-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Oscar Usifer schrieb am 14.09.2009 um 17:14:33 (-0500):
>
> My input XML <entry></entry> values are containing '&' input symbols,
> e.g. "...<entry>Texas A&M</entry>"

That's a syntax error in XML. The ampersand is special. It is used for
entity references, built-in and user-defined, like < or  , and
also for numeric character references, like d. The correct version
would use & to represent the ampersand:

  <entry>Texas A&M</entry>

> which results in errors as follows. I am reading the XML input from a
> SQL database, and thus appear as input CLOBs, so no specification is
> being set on the input charset.

The charset should be UTF-8, UTF-16 with a BOM, or contained in the XML
declaration.

> Any ideas how to resolve?

Well, the data is broken. Garbage in, garbage out. Fix the process that
writes garbage into the database.

--
Michael Ludwig

Parent Message unknown Re: Input '&' symbol causing SystemID Unknown error

by Oscar Usifer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Oh boy, this is an old question. Seems the best solution is to search the input stream and replace with this special character sequence '#'. This did resolved my issue.

Thanks


> ----- Original Message -----
> From: "Oscar Usifer" <oscaruser@...>
> Subject: Input '&' symbol causing SystemID Unknown error
> Date: Mon, 14 Sep 2009 17:14:33 -0500
>
>
> Folks,
>
> My input XML <entry></entry> values are containing '&' input
> symbols, e.g. "...<entry>Texas A&M</entry>", which results in
> errors as follows. I am reading the XML input from a SQL database,
> and thus appear as input CLOBs, so no specification is being set on
> the input charset. Any ideas how to resolve?
>
> Thanks,
> OSC
>
> Reported Error:
>
>    SystemId Unknown; Line #1; Column #5383; The reference to entity
> "M" must end with the ';' delimiter.
>
> Source Code:
>
> public class ChatEncoder {
> ...
>     public static void encodeHTML(Clob clob, Writer out)
>        throws SQLException, IOException, TransformerException,
> TransformerConfigurationException
>     {
>        TransformerFactory tFactory = TransformerFactory.newInstance();
>        StringReader sr = new StringReader(XSLT_RAW);
>        StreamSource ss = new StreamSource(sr);
>        Transformer transformer = tFactory.newTransformer(ss);
>
>        transformer.transform(
>                 new StreamSource(getClobReader(clob)),
>                 new StreamResult(out));
>     }
>
> ...
> }
>
> --
> An Excellent Credit Score is 750
> See Yours in Just 2 Easy Steps!

>


--
An Excellent Credit Score is 750
See Yours in Just 2 Easy Steps!


Re: Input '&' symbol causing SystemID Unknown error

by keshlam :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ampersand is a reserved character in XML. See the XML Recommendation for a description of how Entity References and Character References work; to express an ampersand as a character, you must escape it as '&#35;', 'or &#x27;' or via the predefined entity  '&amp;'.

(Or, in text content of elements, you can use a <![CDATA[]]> section. But I strongly recommend against going that route; in the long run it will almost certainly cause you more trouble than it will solve.)


______________________________________
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
 -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
http://www.ovff.org/pegasus/songs/threes-rev-11.html)

Parent Message unknown Re: Input '&' symbol causing SystemID Unknown error

by Oscar Usifer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

OK, I see I will need to address the less than symbol as well, based on what was said in the spec.

Thanks


> ----- Original Message -----
> From: keshlam@...
> To: "Oscar Usifer" <oscaruser@...>
> Cc: xalan-j-users@...
> Subject: Re: Input '&' symbol causing SystemID Unknown error
> Date: Mon, 14 Sep 2009 20:07:47 -0400
>
>
> Ampersand is a reserved character in XML. See the XML Recommendation for a
> description of how Entity References and Character References work; to
> express an ampersand as a character, you must escape it as '#', 'or
> '' or via the predefined entity  '&'.
>
> (Or, in text content of elements, you can use a <![CDATA[]]> section. But
> I strongly recommend against going that route; in the long run it will
> almost certainly cause you more trouble than it will solve.)
>


--
An Excellent Credit Score is 750
See Yours in Just 2 Easy Steps!