[plexus-dev] [jira] Created: (PLX-152) Plexus xml pull parser doesn't support entities for non-english characters

View: New views
2 Messages — Rating Filter:   Alert me  

[plexus-dev] [jira] Created: (PLX-152) Plexus xml pull parser doesn't support entities for non-english characters

by JIRA jira@codehaus.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Plexus xml pull parser doesn't support entities for non-english characters
--------------------------------------------------------------------------

         Key: PLX-152
         URL: http://jira.codehaus.org/browse/PLX-152
     Project: Plexus
        Type: Bug
 Reporter: Eirik Maus



It is common to introduce xml entities for non-english characters in order to be able to use these in xml files across parses and character sets. This causes the plexus xml pull-parser to fail, a bug which makes many xml files unparsable.

Consider the following fragment from our maven project.xml file


<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE project [
    <!ENTITY OSlash "ø">
    <!ENTITY CapitalOSlash "Ø">
]>
<project>
    <pomVersion>3</pomVersion>
.....
    <developers>
        <!-- 'timezone' used as phone number field -->
        <developer>
            <name>Marit Finne J&OSlash;rgensen</name>
            <id>mfj</id>
            <email>marit ... </email>
</project>


This works with maven 1.0. In order to use the project xml file with maven 1.1, the &OSlash; symbols must be replaced with their real letter 'ø', otherwise the plexus pullparser throws exceptionon the entity.

While not supporting 'ø' actually could be considered a bug in all the other xml parsers, the Entity workaround is legal XML and should be parsable with all parsers.


--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (PLXUTILS-12) Plexus xml pull parser doesn't support entities for non-english characters

by JIRA jira@codehaus.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ http://jira.codehaus.org/browse/PLXUTILS-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=128366#action_128366 ]

Benjamin Bentmann commented on PLXUTILS-12:
-------------------------------------------

Why bother with entity declarations if you can insert the character literally? The XML declaration allows you to select your favorite encoding, e.g. use UTF-8 and happily write quite any character around the world. All JVMs must support UTF-8 and all proper text editors support it.

If you really want to write ASCII-only, you could still use numeric entities like "&\#xuuuu;" which XML parsers understand out-of-the-box.

> Plexus xml pull parser doesn't support entities for non-english characters
> --------------------------------------------------------------------------
>
>                 Key: PLXUTILS-12
>                 URL: http://jira.codehaus.org/browse/PLXUTILS-12
>             Project: Plexus Utils
>          Issue Type: Bug
>            Reporter: Eirik Maus
>
> It is common to introduce xml entities for non-english characters in order to be able to use these in xml files across parses and character sets. This causes the plexus xml pull-parser to fail, a bug which makes many xml files unparsable.
> Consider the following fragment from our maven project.xml file
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <!DOCTYPE project [
>     <!ENTITY OSlash "ø">
>     <!ENTITY CapitalOSlash "Ø">
> ]>
> <project>
>     <pomVersion>3</pomVersion>
> .....
>     <developers>
>         <!-- 'timezone' used as phone number field -->
>         <developer>
>             <name>Marit Finne J&OSlash;rgensen</name>
>             <id>mfj</id>
>             <email>marit ... </email>
> </project>
> This works with maven 1.0. In order to use the project xml file with maven 1.1, the &OSlash; symbols must be replaced with their real letter 'ø', otherwise the plexus pullparser throws exceptionon the entity.
> While not supporting 'ø' actually could be considered a bug in all the other xml parsers, the Entity workaround is legal XML and should be parsable with all parsers.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email