« Return to Thread: XML Element name Verifier is overly strict and doesn't match current XML 1.0 REC

XML Element name Verifier is overly strict and doesn't match current XML 1.0 REC

by Klotz, Leigh :: Rate this Message:

Reply to Author | View in Thread

JDOM 1.1 won't create elements whose characters are in the following
ranges:
  Unicode 0xFF41-0xFF5A (FULLWIDTH LATIN SMALL LETTER A to FULLWIDTH
LATIN SMALL LETTER Z)
  Unicode 0xFF21-0xFF3A (FULLWIDTH LATIN CAPITAL LETTER A to FULLWIDTH
LATIN CAPITAL LETTER Z)

The JDOM 1.1 source for org.jdom.Verifier.isXMLLetter cites production
84 of the XML 1.0 Recommendation for its table of allowed characters.

However, according to http://www.w3.org/TR/REC-xml/ the whole of
Appendix B (which contains Production 84) is obsolete and is not used
within the recommendation.  The XML Rec instead uses production [4] for
NameStartChar and [5] for NameChar.  

The productions at [4] and [5] are considerably smaller than those of
Appendix B, and are more inclusive, providing for greater utility in
I18N applications of XML.

Furthermore, according to http://www.w3.org/TR/REC-xml/ Appendix J
(Non-Normative), the characters I menition above are not only allowed,
but encouraged for use in XML Names, because the Unicode ID_Start
property and ID_Continue of these Unicode code points is True.  

The XML REC says:

    1. The first character of any name should have a Unicode property of
ID_Start, or else be '_' #x5F.
    2. Characters other than the first should have a Unicode property of
ID_Continue, or ...

You can see that ID_Start and ID_Continue are True on the individual
pages for the small letters here:
http://unicode.org/cldr/utility/character.jsp?a=FF41
to
http://unicode.org/cldr/utility/character.jsp?a=FF5A

I recommend that org.jdom.Verifier.isXMLLetter be updated to use
production [4], [4a], and [5] of XML 1.0 Fifth Edition.
It's quite likely that some of the other character class verifiers need
updating as well, but I didn't examine them.

Leigh.

_______________________________________________
To control your jdom-interest membership:
http://www.jdom.org/mailman/options/jdom-interest/youraddr@...

 « Return to Thread: XML Element name Verifier is overly strict and doesn't match current XML 1.0 REC