Doctype and encoding storage

View: New views
4 Messages — Rating Filter:   Alert me  

Doctype and encoding storage

by Benoit Mercier :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I have been using eXist daily on production systems from about 2 years
and I have just discovered today that I am not able to store and
retrieve an XML document without information loss ;-)

First of all, the following line is removed when I store documents
having it as first line:

<?xml version="1.0" encoding="UTF-16"?>

Document is stored without error (via Interactive client or Java XML-RPC
API) but the line disappeared once retrieved (via Interactive client or
Java XML-RPC API).  Via REST interface the line comes back but with
UTF-8 encoding (I suppose it as been rewritten).

I notice the same behaviour with DOCTYPE definitions like this one:

<!DOCTYPE dictionnaire SYSTEM "http://toto.ca/blabla.dtd">

Note that the given DTD isn't recorded in catalog.xml and I am currently
using eXist 1.2.6 on Linux (Ubuntu Server).

Am I wrong?  How to keep these lines when storing document in eXist
database?  The encoding is very important for me: how to be sure that
eXist will store/retrieve it correctly ?

Thank you very much in advance for your help!

Best regards,

Benoit (mercibe)

------------------------------------------------------------------------------
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Doctype and encoding storage

by Adam Retter-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Benoit,

From the REST interface with XQuery you could use the serialization
options to output the encoding and document definition -

http://exist-db.org/xquery.html#serialization

That should enable you to make your output look the same as your input.

Cheers Adam.


2009/6/23 Benoit Mercier <Benoit.Mercier@...>:

> Hi,
>
> I have been using eXist daily on production systems from about 2 years
> and I have just discovered today that I am not able to store and
> retrieve an XML document without information loss ;-)
>
> First of all, the following line is removed when I store documents
> having it as first line:
>
> <?xml version="1.0" encoding="UTF-16"?>
>
> Document is stored without error (via Interactive client or Java XML-RPC
> API) but the line disappeared once retrieved (via Interactive client or
> Java XML-RPC API).  Via REST interface the line comes back but with
> UTF-8 encoding (I suppose it as been rewritten).
>
> I notice the same behaviour with DOCTYPE definitions like this one:
>
> <!DOCTYPE dictionnaire SYSTEM "http://toto.ca/blabla.dtd">
>
> Note that the given DTD isn't recorded in catalog.xml and I am currently
> using eXist 1.2.6 on Linux (Ubuntu Server).
>
> Am I wrong?  How to keep these lines when storing document in eXist
> database?  The encoding is very important for me: how to be sure that
> eXist will store/retrieve it correctly ?
>
> Thank you very much in advance for your help!
>
> Best regards,
>
> Benoit (mercibe)
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Exist-open mailing list
> Exist-open@...
> https://lists.sourceforge.net/lists/listinfo/exist-open
>



--
Adam Retter

eXist Developer
{ United Kingdom }
adam@...
irc://irc.freenode.net/existdb

------------------------------------------------------------------------------
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Doctype and encoding storage

by Wolfgang Meier-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Am I wrong?  How to keep these lines when storing document in eXist
> database?  The encoding is very important for me: how to be sure that
> eXist will store/retrieve it correctly ?

Neither the XML declaration nor the doctype are part of the document
model. With respect to the character encoding, eXist relies on Java's
unicode handling, so once the text of the document has been parsed, it
will be processed as unicode, no matter what encoding the file used on
disk. When writing out a document, it is the job of the serializer to
choose an output encoding. Use the serialization options to determine
which encoding is used.

eXist also stores the doctype declaration in the document's metadata,
but will not print it out by default when serializing the document
(mainly to avoid potential issues with internal entity declarations).
DTD's are always a bit problematic as they are themselves not XML.

Wolfgang

------------------------------------------------------------------------------
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Doctype and encoding storage

by John Craft :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> eXist also stores the doctype declaration in the document's metadata,
> but will not print it out by default when serializing the document
> (mainly to avoid potential issues with internal entity declarations).
> DTD's are always a bit problematic as they are themselves not XML.

Is it possible to override the default and have eXist output the doctype
declaration during serialization?

Thanks.

John Craft

------------------------------------------------------------------------------
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open