« Return to Thread: UTF-8 vs UTF-16 File Polling

Re: UTF-8 vs UTF-16 File Polling

by jsexton0 :: Rate this Message:

Reply to Author | View in Thread

Hi -

My observation is that the file polling only works if the file is UTF-8, and fails otherwise, regardless of what the XML header says (it can say anything).  It appears to not respect the BOM characters.

My project encoding is the default, UTF-8.

For sample inputs I tried UTF-16, both big-endian and little-endian, with the BOM present.  The file I ultimately want to read comes from another system, in UTF-16 LE (or what Windows calls "unicode").  The samples I made I did using software (Java) and hex dumps to make sure I had what I expected.  I also used Windows notepad and save-as operations to change the encodings.   Windows correctly recognized all the samples as what it calls either "ANSI", "UTF-8", "unicode" or "unicode big endian", Windows "unicode" being UTF-16 LE.


A few questions

- what is the encoding of your nb project in which you are doing the below activities ?
the default project encoding is utf-8 but can be changed in project properties.

-is the xml file generated by netbeans ie file -> new -> xml ?

if so, what is its encoding tag ?
the default file encoding for such files should be that as the project encoding.

but that the encoding tags should be used instead of project encoding, that is, they override project encoding.

I'm wondering if perhaps the project encoding is being used instead of the encoding mentioned in files;  that is, as you mention,
even if change header to utf-16, it works if file itself is utf-8 but otherwise it does not.

Thanks - Ken


jsexton0 wrote:
I've created a file polling WSDL that picks up an XML file.  This works fine, except that the process errors out with "content not allowed in prologue" if the input file is anything but UTF-8.  The XSD used to create the WSDL is UTF-16, and is labeled as such in its header.

If I change the header of the input file to say UTF-16, even though the file is UTF-8, and the XSD header says UTF-16, it still works.  But if I change the input file to "Unicode", and label it UTF-16 in it's header, the file pick-up fails.

Eventually I need a UTF-16 file to be read.  Is there some trick to picking up a UTF-16 XML file using a file polling WSDL?

On a side note, the XSD that defines this file hangs the GUI BPEL editor completely, every time an attempt is made to access it in anyway, such as an Assign.  Editing the BPEL XML by hand in the source view worked perfectly however.  The XSD and the BPEL both validate and compile fine.

Thanks

 « Return to Thread: UTF-8 vs UTF-16 File Polling