|
View:
New views
6 Messages
—
Rating Filter:
Alert me
|
|
|
XML Element name Verifier is overly strict and doesn't match current XML 1.0 RECJDOM 1.1 won't create elements whose characters are in the following
ranges: Unicode 0xFF41-0xFF5A (FULLWIDTH LATIN SMALL LETTER A to FULLWIDTH LATIN SMALL LETTER Z) Unicode 0xFF21-0xFF3A (FULLWIDTH LATIN CAPITAL LETTER A to FULLWIDTH LATIN CAPITAL LETTER Z) The JDOM 1.1 source for org.jdom.Verifier.isXMLLetter cites production 84 of the XML 1.0 Recommendation for its table of allowed characters. However, according to http://www.w3.org/TR/REC-xml/ the whole of Appendix B (which contains Production 84) is obsolete and is not used within the recommendation. The XML Rec instead uses production [4] for NameStartChar and [5] for NameChar. The productions at [4] and [5] are considerably smaller than those of Appendix B, and are more inclusive, providing for greater utility in I18N applications of XML. Furthermore, according to http://www.w3.org/TR/REC-xml/ Appendix J (Non-Normative), the characters I menition above are not only allowed, but encouraged for use in XML Names, because the Unicode ID_Start property and ID_Continue of these Unicode code points is True. The XML REC says: 1. The first character of any name should have a Unicode property of ID_Start, or else be '_' #x5F. 2. Characters other than the first should have a Unicode property of ID_Continue, or ... You can see that ID_Start and ID_Continue are True on the individual pages for the small letters here: http://unicode.org/cldr/utility/character.jsp?a=FF41 to http://unicode.org/cldr/utility/character.jsp?a=FF5A I recommend that org.jdom.Verifier.isXMLLetter be updated to use production [4], [4a], and [5] of XML 1.0 Fifth Edition. It's quite likely that some of the other character class verifiers need updating as well, but I didn't examine them. Leigh. _______________________________________________ To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr@... |
|
|
Re: XML Element name Verifier is overly strict and doesn't match current XML 1.0 RECNote that this had a pretty good debate on xml-dev (while our list was
down): http://markmail.org/message/wqcmohlf7srpqhkl General consensus seems to be the current behavior is the lesser of two evils. -jh- On Mar 19, 2009, at 2:50 PM, Klotz, Leigh wrote: > JDOM 1.1 won't create elements whose characters are in the following > ranges: > Unicode 0xFF41-0xFF5A (FULLWIDTH LATIN SMALL LETTER A to FULLWIDTH > LATIN SMALL LETTER Z) > Unicode 0xFF21-0xFF3A (FULLWIDTH LATIN CAPITAL LETTER A to FULLWIDTH > LATIN CAPITAL LETTER Z) > > The JDOM 1.1 source for org.jdom.Verifier.isXMLLetter cites production > 84 of the XML 1.0 Recommendation for its table of allowed characters. > > However, according to http://www.w3.org/TR/REC-xml/ the whole of > Appendix B (which contains Production 84) is obsolete and is not used > within the recommendation. The XML Rec instead uses production [4] > for > NameStartChar and [5] for NameChar. > > The productions at [4] and [5] are considerably smaller than those of > Appendix B, and are more inclusive, providing for greater utility in > I18N applications of XML. > > Furthermore, according to http://www.w3.org/TR/REC-xml/ Appendix J > (Non-Normative), the characters I menition above are not only allowed, > but encouraged for use in XML Names, because the Unicode ID_Start > property and ID_Continue of these Unicode code points is True. > > The XML REC says: > > 1. The first character of any name should have a Unicode property > of > ID_Start, or else be '_' #x5F. > 2. Characters other than the first should have a Unicode property > of > ID_Continue, or ... > > You can see that ID_Start and ID_Continue are True on the individual > pages for the small letters here: > http://unicode.org/cldr/utility/character.jsp?a=FF41 > to > http://unicode.org/cldr/utility/character.jsp?a=FF5A > > I recommend that org.jdom.Verifier.isXMLLetter be updated to use > production [4], [4a], and [5] of XML 1.0 Fifth Edition. > It's quite likely that some of the other character class verifiers > need > updating as well, but I didn't examine them. > > Leigh. > > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/ > youraddr@... _______________________________________________ To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr@... |
|
|
RE: XML Element name Verifier is overly strict anddoesn't match current XML 1.0 REC>
> General consensus seems to be the current behavior is the > lesser of two evils. > I don't agree. While I'm definitely among those who think the XML spec shouldn't have been changed in this way, I think the best way of minimising the damage is for everyone now to move forward. Michael Kay http://www.saxonica.com/ _______________________________________________ To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr@... |
|
|
Re: XML Element name Verifier is overly strict anddoesn't match current XML 1.0 RECRight now Xerces (jdom's default parser) only claims to support "Fourth" edition.
Brad "Michael Kay" writes: > > > > General consensus seems to be the current behavior is the > > lesser of two evils. > > > > I don't agree. While I'm definitely among those who think the XML spec > shouldn't have been changed in this way, I think the best way of minimising > the damage is for everyone now to move forward. > > Michael Kay > http://www.saxonica.com/ > http://www.jdom.org/mailman/options/jdom-interest/youraddr@... To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr@... |
|
|
Re: XML Element name Verifier is overly strict anddoesn't match current XML 1.0 RECMichael Kay wrote:
> I don't agree. While I'm definitely among those who think the XML spec > shouldn't have been changed in this way, I think the best way of minimising > the damage is for everyone now to move forward. I think the best way of minimising the damage is for everyone now to stay put. :-) But given that everyone isn't going to do anything, I think the minimal damage is to avoid putting anything that requires XML 1.1 (or 1.0.5) on the wire. That way documents produced by JDOM will have maximum interoperability. Since few people (possibly no people) actually need the changes imposed by XML > 1.0, it is better to flag any 1.0 illegal name characters as early as possible as unintentional bugs caused by character set confusion that should be corrected. It's not like the world is crying out to use musical symbols as elements names or use EBCDIC line breaks. If this ever changes, we can update then, Until such time, we'd do more harm than good by loosening the restrictions. -- Elliotte Rusty Harold elharo@... Refactoring HTML Just Published! http://www.amazon.com/exec/obidos/ISBN=0321503635/ref=nosim/cafeaulaitA _______________________________________________ To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr@... |
|
|
RE: XML Element name Verifier is overly strict and doesn't match current XML 1.0 RECI agree.
Thank you both for researching the issue and for getting the list back up. Leigh. -----Original Message----- From: Jason Hunter [mailto:jhunter@...] Sent: Saturday, March 21, 2009 7:55 PM To: jdom interest Cc: Klotz, Leigh Subject: Re: [jdom-interest] XML Element name Verifier is overly strict and doesn't match current XML 1.0 REC Note that this had a pretty good debate on xml-dev (while our list was down): http://markmail.org/message/wqcmohlf7srpqhkl General consensus seems to be the current behavior is the lesser of two evils. -jh- On Mar 19, 2009, at 2:50 PM, Klotz, Leigh wrote: > JDOM 1.1 won't create elements whose characters are in the following > ranges: > Unicode 0xFF41-0xFF5A (FULLWIDTH LATIN SMALL LETTER A to FULLWIDTH > LATIN SMALL LETTER Z) Unicode 0xFF21-0xFF3A (FULLWIDTH LATIN CAPITAL > LETTER A to FULLWIDTH LATIN CAPITAL LETTER Z) > > The JDOM 1.1 source for org.jdom.Verifier.isXMLLetter cites production > 84 of the XML 1.0 Recommendation for its table of allowed characters. > > However, according to http://www.w3.org/TR/REC-xml/ the whole of > Appendix B (which contains Production 84) is obsolete and is not used > within the recommendation. The XML Rec instead uses production [4] > for NameStartChar and [5] for NameChar. > > The productions at [4] and [5] are considerably smaller than those of > Appendix B, and are more inclusive, providing for greater utility in > I18N applications of XML. > > Furthermore, according to http://www.w3.org/TR/REC-xml/ Appendix J > (Non-Normative), the characters I menition above are not only allowed, > but encouraged for use in XML Names, because the Unicode ID_Start > property and ID_Continue of these Unicode code points is True. > > The XML REC says: > > 1. The first character of any name should have a Unicode property > of ID_Start, or else be '_' #x5F. > 2. Characters other than the first should have a Unicode property > of ID_Continue, or ... > > You can see that ID_Start and ID_Continue are True on the individual > pages for the small letters here: > http://unicode.org/cldr/utility/character.jsp?a=FF41 > to > http://unicode.org/cldr/utility/character.jsp?a=FF5A > > I recommend that org.jdom.Verifier.isXMLLetter be updated to use > production [4], [4a], and [5] of XML 1.0 Fifth Edition. > It's quite likely that some of the other character class verifiers > need updating as well, but I didn't examine them. > > Leigh. > > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/ > youraddr@... _______________________________________________ To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr@... |
| Free embeddable forum powered by Nabble | Forum Help |