|
View:
New views
9 Messages
—
Rating Filter:
Alert me
|
|
|
|
|
|
Re: Changes to DOM3 Events Key IdentifiersHi, Mark-
Mark Davis ☕ wrote (on 10/30/09 12:22 PM): > I want to point out that Unicode code points can go up to hex 10FFFF. > The standard for \u is exactly 4 digits, so that one can intermix with > characters and know where it terminates. There are a couple of schemes > that are used to extend this to up to 6 digits, and still know where to > terminate. > > \UXXXXXXXX - C++, ICU > \UXXXXXX - C# > \u{xxxxxx} - Ruby > > There needs to be some mechanism for extending to 6 digits. It would be > best to use one of the above rather than a new one. (My personal > favorite is Ruby's.) The reason the "\u" escaped character sequence was chosen was that it is the native ECMAScript escape notation, which is easy for browser-based applications to use directly (i.e. they can inject it directly into the markup as a character). But, yes, this does have the cap of 4 digits, and I personally would prefer to use a different escape mechanism... but only if one or both of these 2 conditions obtains: 1) DOM3 Events implementations also update their Javascript engines to be able to process the additional escape sequence (e.g. one of the ones you mention above) in the same way they process the "\u" escape sequence. This is the better long-term solution, and I'd hope ECMA TC39 could be persuaded to add this to future ECMAScript specs. 2) Script authors could use a normalizing method (c.f. convertKeyValue) to "dumb down" the 6-digit escape sequence into the 4-digit format (by converting to surrogate pairs when necessary). Javascript is becoming increasingly important, and so is the need for internationalized and localized language support. With the new font-linking enablers (including my favorite, WOFF [1]), and i18n domain extension policy [2], we're going to see more use of languages I have no chance of ever understanding, and I want DOM3 Events and ECMAScript to be part of that. I'd rather not introduce a not-very-good solution (UTF-16) that we know would not meet all the needs of the world community, just because of a (temporary?) circumstance with a vagary of Javascript. But, I also want this spec interoperably implemented... so, any solution needs the buy-in of the implementers. Any arguments on either side of the coin would help make a more informed decision. BTW, you stated a preference for the Ruby-style delimited escaped characters... could you say why you prefer that? [1] http://people.mozilla.com/~jkew/woff/woff-2009-09-16.html [2] http://www.icann.org/en/announcements/announcement-30oct09-en.htm Regards- -Doug Schepers W3C Team Contact, SVG and WebApps WGs |
|
|
Re: Changes to DOM3 Events Key IdentifiersIf the target of this is JavaScript, then the alternative (which Java has also chosen) is to use the UTF16 representation, wherein a pair of \u characters represents each supplementary character (above FFFF). It just needs to be carefully documented.
Mark On Fri, Oct 30, 2009 at 11:38, Doug Schepers <schepers@...> wrote: Hi, Mark- |
|
|
Re: Changes to DOM3 Events Key IdentifiersDoug Schepers scripsit:
> 1) DOM3 Events implementations also update their Javascript engines to > be able to process the additional escape sequence (e.g. one of the ones > you mention above) in the same way they process the "\u" escape > sequence. This is the better long-term solution, and I'd hope ECMA TC39 > could be persuaded to add this to future ECMAScript specs. I doubt it, given that such escapes are usually programmatically generated. In any case, ECMAScript is firmly committed to a 16-bit character model. -- A rabbi whose congregation doesn't want John Cowan to drive him out of town isn't a rabbi, http://www.ccil.org/~cowan and a rabbi who lets them do it cowan@... isn't a man. --Jewish saying |
|
|
Re: Changes to DOM3 Events Key IdentifiersJava is committed to 16-bit code units as well, but a relatively small number of additions enabled effective handling of UTF-16 text.
Mark On Fri, Oct 30, 2009 at 13:42, John Cowan <cowan@...> wrote: Doug Schepers scripsit: |
|
|
Re: Changes to DOM3 Events Key IdentifiersMark Davis �?? scripsit:
> Java is committed to 16-bit code units as well, but a relatively small > number of additions enabled effective handling of UTF-16 text. Good luck getting any such additions into JavaScript, except as a library. -- They tried to pierce your heart John Cowan with a Morgul-knife that remains in the http://www.ccil.org/~cowan wound. If they had succeeded, you would become a wraith under the domination of the Dark Lord. --Gandalf |
|
|
RE: Changes to DOM3 Events Key Identifiers> Doug Schepers scripsit:
> > > 1) DOM3 Events implementations also update their Javascript > engines to > > be able to process the additional escape sequence (e.g. one of > the ones > > you mention above) in the same way they process the "\u" escape > > sequence. This is the better long-term solution, and I'd hope > ECMA TC39 > > could be persuaded to add this to future ECMAScript specs. > > I doubt it, given that such escapes are usually programmatically > generated. > In any case, ECMAScript is firmly committed to a 16-bit character > model. > ECMAScript's "firm commitment" to a 16-bit character model (i.e. UTF-16) is not the problem. Lack of support for supplementary characters (that is, those above 0xFFFF in Unicode), however, is a very real problem. No UTF-16 process can escape the fact that, even if one applies a short-sighted limit to BMP characters, a character may require more than one code point to encode. As long as it is clear that DOM3 Events key identifiers are a string containing possibly more than one code point (and potentially more than one character), the escaping syntax is just a detail of the language. Addison |
|
|
Re: Changes to DOM3 Events Key IdentifiersPhillips, Addison scripsit:
> ECMAScript's "firm commitment" to a 16-bit character model (i.e. UTF-16) If only. JavaScript and JSON strings aren't sequences of characters, they are sequences of 16-bit unsigned integers. If you happen to want to interpret them as UTF-16, you are free to do so, but there is not and never will be any guarantee that all strings are well-formed UTF-16. What's more, the built-in JSON serializer provided by ECMAScript 5th edition does not generate escape sequences for isolated surrogate codepoints, so that some strings will be written out in CESU-8 rather than UTF-8. Worse yet, the JSON RFC is self-contradictory, with the result that it's not even clear that CESU-8-encoded JSON is illegal. -- Let's face it: software is crap. Feature-laden and bloated, written under tremendous time-pressure, often by incapable coders, using dangerous languages and inadequate tools, trying to connect to heaps of broken or obsolete protocols, implemented equally insufficiently, running on unpredictable hardware -- we are all more than used to brokenness. --Felix Winkelmann |
|
|
Re: Changes to DOM3 Events Key Identifiers> If you happen to want to interpret
them as UTF-16, you are free to do so, but there is not and never will be any guarantee that all strings are well-formed UTF-16. You never have that guarantee, any more than you have the guarantee that a source purporting to be UTF-8 is in fact well formed. All conscientious recipients need to check the data -- if they are sensitive to ill-formed text. Luckily, the impact of ill-formed UTF-16 is vastly less than that of ill-formed UTF-8. Mark On Fri, Oct 30, 2009 at 17:47, John Cowan <cowan@...> wrote: Phillips, Addison scripsit: |
| Free embeddable forum powered by Nabble | Forum Help |