|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
Changes to DOM3 Events Key IdentifiersHi, Folks-
(BCC to potentially affected groups: w3c-html-cg, public-webapps, public-i18n-core, wai-xtech, www-svg, public-forms, public-xhtml2, public-html@..., www-voice... please forward on to any relevant groups or individuals I may have missed, especially outside W3C.) As editor of the DOM3 Events specification, I made what some may consider to be drastic changes in the most recent drafts: * I changed the syntax of the key identifier strings from "U+xxxx" (a plain string representing the Unicode code point) to "\uxxxx" (an escaped UTF-16 character string), based on content author and implementer feedback. * I renamed the "key identifier(s)" feature to "key value(s)". I've mentioned these ideas before in DOM3 Events telcons, and finally decided to do it, after first consulting with the I18n WG, who generally approved of the scheme (though not without some comments about details that will need to be addressed and resolved). The new string format should be easier to deal with for developers, and the new name reflects some confusion I've encountered when explaining what "key identifiers" are... the work "identifier" seems to evoke the concept of a unique identifier for a key, when in fact what the feature does is provides the most appropriate value given the state of keyboard modifiers and modes. I have tried also to clarify this in the prose of the spec. We are aware that there may already be implementations and specifications that rely on the previous string format and name (as well as links), back from when this was a W3C Note, and we do not make this decision lightly, but we do believe this is the right decision for a stable and internationalized keyboard interface going forward. For those implementations and specifications that need the previous functionality and name, you may be able to reference the SVG Tiny 1.2 specification [2] instead, which does include the old Key Identifiers feature more or less intact from the previous definition, and is a stable W3C Recommendation. You can review the changes in the most recent Editor's Draft [1]. The WebApps WG welcomes your feedback to the www-dom@... list. This specification is still a work in progress, though we do hope to go to Last Call soon, so we are open to suggestions. (Note that the spec is mostly feature-complete, so new event types and other changes may have to wait for the next version, but send them on anyway.) [1] http://dev.w3.org/2006/webapi/DOM-Level-3-Events/html/DOM3-Events.html#keyset [2] http://www.w3.org/TR/SVGTiny12/svgudom.html#KeyIdentifiersSet Regards- -Doug Schepers, on behalf of the WebApps WG Editor, DOM Level 3 Events W3C Team Contact, SVG and WebApps WGs |
|
|
Re: Changes to DOM3 Events Key IdentifiersHi Doug & www-dom crowd, --Original Message--: >Hi, Folks- > >(BCC to potentially affected groups: w3c-html-cg, public-webapps, >public-i18n-core, wai-xtech, www-svg, public-forms, public-xhtml2, >public-html@..., www-voice... please forward on to any relevant >groups or individuals I may have missed, especially outside W3C.) > >As editor of the DOM3 Events specification, I made what some may >consider to be drastic changes in the most recent drafts: > * I changed the syntax of the key identifier strings from "U+xxxx" (a >plain string representing the Unicode code point) to "\uxxxx" (an >escaped UTF-16 character string), based on content author and >implementer feedback. I think this is a terrible change. Predominantly for SVG Tiny 1.2 user agents that want to support DOM3 as well. In Tiny devices memory footprint is critical and this just introduces an entire extra set of strings that will bloat the binary for no added functionality. The "\uxxxx" syntax is just reminiscent of a programming language but has little to do with strings returned from DOM APIs. I suppose you could argue that it's useful in some situations, but my concern is more about the existence of 2 things that mean the same thing. i.e. in a CDF document, the script has to deal with "U+xxxx" or "\uxxx" dependent on which namespace the element is living in. I know that you can use 'keyIdentifier' and 'keyValue' to distinguish them, however that pushes the detection logic to the wrong place. With the existence of the SVG Tiny 1.2 recommendation using the old identifiers, it's hard to see the need for this change. > * I renamed the "key identifier(s)" feature to "key value(s)". That is a nice change - and does reflect what the semantics are. It's also a minimal implementation burden to support both keyIdentifier and keyValue. >I've mentioned these ideas before in DOM3 Events telcons, and finally >decided to do it, after first consulting with the I18n WG, who generally >approved of the scheme (though not without some comments about details >that will need to be addressed and resolved). > >The new string format should be easier to deal with for developers, and >the new name reflects some confusion I've encountered when explaining >what "key identifiers" are... the work "identifier" seems to evoke the >concept of a unique identifier for a key, when in fact what the feature >does is provides the most appropriate value given the state of keyboard >modifiers and modes. I have tried also to clarify this in the prose of >the spec. > >We are aware that there may already be implementations and >specifications that rely on the previous string format and name (as well >as links), back from when this was a W3C Note, and we do not make this >decision lightly, but we do believe this is the right decision for a >stable and internationalized keyboard interface going forward. For >those implementations and specifications that need the previous >functionality and name, you may be able to reference the SVG Tiny 1.2 >specification [2] instead, which does include the old Key Identifiers >feature more or less intact from the previous definition, and is a >stable W3C Recommendation. Given a stable recommendation using the old identifier strings, I'd suggest there's no need for that change. It will simply add to implementor effort required to support both SVG Tiny 1.2 and DOM3. It would be interesting to hear of any technically sound argument as to why '\uxxx' is superior to 'U+xxx' given they both consume the same number of bytes in script. Best regards, Alex >You can review the changes in the most recent Editor's Draft [1]. The >WebApps WG welcomes your feedback to the www-dom@... list. This >specification is still a work in progress, though we do hope to go to >Last Call soon, so we are open to suggestions. (Note that the spec is >mostly feature-complete, so new event types and other changes may have >to wait for the next version, but send them on anyway.) > >[1] >http://dev.w3.org/2006/webapi/DOM-Level-3-Events/html/DOM3-Events.html#keyset >[2] http://www.w3.org/TR/SVGTiny12/svgudom.html#KeyIdentifiersSet > > >Regards- >-Doug Schepers, on behalf of the WebApps WG >Editor, DOM Level 3 Events >W3C Team Contact, SVG and WebApps WGs > > |
|
|
Re: Changes to DOM3 Events Key IdentifiersHi, Alex-
Alex Danilo wrote (on 10/30/09 4:05 AM): > > --Original Message--: >> >>As editor of the DOM3 Events specification, I made what some may >>consider to be drastic changes in the most recent drafts: >> * I changed the syntax of the key identifier strings from "U+xxxx" (a >>plain string representing the Unicode code point) to "\uxxxx" (an >>escaped UTF-16 character string), based on content author and >>implementer feedback. > > I think this is a terrible change. > > Predominantly for SVG Tiny 1.2 user agents that want to support > DOM3 as well. > > In Tiny devices memory footprint is critical and this just introduces > an entire extra set of strings that will bloat the binary for no > added functionality. Why not simply store the Unicode code point, and compose it with "U+" or "\u" on the fly? > The "\uxxxx" syntax is just reminiscent of a programming language > but has little to do with strings returned from DOM APIs. I suppose > you could argue that it's useful in some situations, but my concern > is more about the existence of 2 things that mean the same > thing. > > i.e. in a CDF document, the script has to deal with "U+xxxx" or > "\uxxx" dependent on which namespace the element is living > in. I know that you can use 'keyIdentifier' and 'keyValue' to > distinguish them, however that pushes the detection logic > to the wrong place. How so? If the author gets SVGT1.2's ".keyIdentifier" attribute, the code point is prepended with "U+", and with ".key", it's prepended with "\u". Am I missing some point of optimization? FWIW, we heard feedback from BitFlash that authors did not like the "U+" syntax, and I believe they may have made some concession to convenience in their implementation (though I don't know the details). >> * I renamed the "key identifier(s)" feature to "key value(s)". > > That is a nice change - and does reflect what the semantics > are. It's also a minimal implementation burden to support both > keyIdentifier and keyValue. Actually, the name of the attribute on the Key interface has been changed to ".key", which is simpler, if less descriptive. Regards- -Doug Schepers W3C Team Contact, SVG and WebApps WGs |
|
|
Re: Changes to DOM3 Events Key IdentifiersOn Oct 30, 2009, at 1:05 AM, Alex Danilo wrote: > > The "\uxxxx" syntax is just reminiscent of a programming language > but has little to do with strings returned from DOM APIs. I suppose > you could argue that it's useful in some situations, but my concern > is more about the existence of 2 things that mean the same > thing. "\uxxxx" is not a syntax, it is a Unicode string of the actual character. \u introduces the escape sequence for a unicode code point. So you can compare it directly to a character. Regards, Maciej |
|
|
Re: Changes to DOM3 Events Key IdentifiersHi Doug,
--Original Message--: >Hi, Alex- > >Alex Danilo wrote (on 10/30/09 4:05 AM): >> >> --Original Message--: >>> >>>As editor of the DOM3 Events specification, I made what some may >>>consider to be drastic changes in the most recent drafts: >>> * I changed the syntax of the key identifier strings from "U+xxxx" (a >>>plain string representing the Unicode code point) to "\uxxxx" (an >>>escaped UTF-16 character string), based on content author and >>>implementer feedback. >> >> I think this is a terrible change. >> >> Predominantly for SVG Tiny 1.2 user agents that want to support >> DOM3 as well. >> >> In Tiny devices memory footprint is critical and this just introduces >> an entire extra set of strings that will bloat the binary for no >> added functionality. > >Why not simply store the Unicode code point, and compose it with "U+" or >"\u" on the fly? Because the other key identifiers are not mapped to Unicode code points. For example F1, F2, etc. have no Unicode representation. On a Windows system for example, the mappings of key codes to the representational strings are most logically placed at the site of the event handing from a Windows key event. It's at that point that you decide to map 'U+' or '\u' or 'F1', etc. So it's completely contained at the operating system specific level. That doesn't require the DOM tree or the scripting interface to know anything about the underlying source of the key event code. The mapping point architecturally could move, but the most logical place to put it is in the code that deals with OS level events. Yes you could compose on the fly - but now you introduce an additional level of detection in the scripting interface to push the key code to string mapping up from the OS specific layers into the scripting interface. Also, I see the '\u' string as problematic in a historical/cross language form. The use of '\' as anything is _bad_. I hate to dredge it up, but the single most stupid design decision in MS-DOS was to choose '\' as a path separator vs. CP/M of the day and UNIX that all used '/'. To this day, C++ programmers the world over have to write all this special case code to include '\\' for Windows, and '/' for UNIX and it's a mess. Using '\u' in the string for Javascript ignores other languages we may want to bind to the DOM. So in some of them '\' is the escape character and so you need to type '\\' instead. >From an implementation point of view, anything can be supported but there is one guiding principle that is a nice one - don't repeat yourself. Our aim with the multi-namespace model is to unify everything. This seems like fragmentation. If an author wants to handle an event in an SVG/HTML document do they author with "U+xxxx" or "\uxxxx"? It's confusing, and of no visible benefit. If there's a JSON reason to make this change, then let's hear the reasons but at the moment I can only see downsides in both implementation time/complexity and author confusion in a compound document environment as well as the added pain for programmers dealing with backslash hell:-) (And before the Apple fanbois get cocky about \ vs. / and we told you so, the Mac OS 9 (and earlier) decision to make '\n' emit a '\r' into the file and subvert C newline conventions is just as bad:) >> The "\uxxxx" syntax is just reminiscent of a programming language >> but has little to do with strings returned from DOM APIs. I suppose >> you could argue that it's useful in some situations, but my concern >> is more about the existence of 2 things that mean the same >> thing. >> >> i.e. in a CDF document, the script has to deal with "U+xxxx" or >> "\uxxx" dependent on which namespace the element is living >> in. I know that you can use 'keyIdentifier' and 'keyValue' to >> distinguish them, however that pushes the detection logic >> to the wrong place. > >How so? If the author gets SVGT1.2's ".keyIdentifier" attribute, the >code point is prepended with "U+", and with ".key", it's prepended with >"\u". Am I missing some point of optimization? Well, I'd personally avoid '\' in any form of string identifier returned from anything since historically it's been used as a special escape. In C/C++ you use \x or similar where it has special meaning, but there's no \u, where the returned string from keyIdentifier is a descriptive string, not an escape mechanism. There is a distinction. >FWIW, we heard feedback from BitFlash that authors did not like the "U+" >syntax, and I believe they may have made some concession to convenience >in their implementation (though I don't know the details). That's the problem with users, they get upset about their own needs without realizing the implications of such a 'simple' change. It would be nice if someone could actually point to why '\uxxx' is better than 'U+xxx' from a technical perspective. Avoiding the use of '\' as I said above is far more important. Oh, and making sure there's a lack of redundancy that just creates author confusion. >>> * I renamed the "key identifier(s)" feature to "key value(s)". >> >> That is a nice change - and does reflect what the semantics >> are. It's also a minimal implementation burden to support both >> keyIdentifier and keyValue. > >Actually, the name of the attribute on the Key interface has been >changed to ".key", which is simpler, if less descriptive. Sounds good. That's of minor consequence compared to my concerns above. But anyway, as implementors we will willingly build what is asked for, however, in the presence of existing mechanisms to handle the identifiers it's hard to justify adding another way to do the same thing and then bring on the possibility that some user agents will support only one or both of the alternatives forcing a least common denominator authoring approach for interoperable content... Cheers, Alex >Regards- >-Doug Schepers >W3C Team Contact, SVG and WebApps WGs > > > |
|
|
Re: Changes to DOM3 Events Key IdentifiersHi Maciej,
--Original Message--: > >On Oct 30, 2009, at 1:05 AM, Alex Danilo wrote: > >> >> The "\uxxxx" syntax is just reminiscent of a programming language >> but has little to do with strings returned from DOM APIs. I suppose >> you could argue that it's useful in some situations, but my concern >> is more about the existence of 2 things that mean the same >> thing. > >"\uxxxx" is not a syntax, it is a Unicode string of the actual >character. \u introduces the escape sequence for a unicode code point. >So you can compare it directly to a character. Thanks for the clarification. '\x' in a C/C++ string specifies the hexadecimal value which is analagous to what is proposed here. Does this map natively in Javascript (forgive my ignorance)? Is this a Javascript specific mapping? I don't know how this will map to other DOM language bindings. So, how does this provide any advantage over 'keyCode'? It seemed to me that keyCode is used for the code point, as in maps to the Unicode point and the whole reason there was keyIdentifier was to provide descriptive strings. If I want the Unicode point explicitly I can use keyCode or am I missing something? Cheers, Alex >Regards, >Maciej > > > |
|
|
Re: Changes to DOM3 Events Key IdentifiersHi,
I really really don’t see the benefit of that change. It only complicates things because it looks similar to a JS-string encoded character but is not actually one. I.e., "\u2018" does not equal "\\u2018". By introducing a backslash, I can already see people getting confused by this, writing: if (event.key == "\u2018") instead of if (event.key == "\\u2018") I know I often forget to double-escape the backslash (especially when I write regular expressions in strings :)), and then scratch my head over why it doesn’t work. Additionally, using \u gives the impression that it is an encoded string character, but that is not the case. People are in my experience having trouble realising the difference between serialisation and the actual character (e.g. they think that .nodeValue of a textnode containing & will return & instead of &). I think introducing this ‘encoding in a string value’ does not help people. Why would the new syntax be more intuitive than the old one? Just because one implementor says they got author feedback about this? Well, hereby I give you the feedback that as an author I do not ‘like’ the UTF-16 escaped character string :). I think it’s nicer to use the U+-notation, that is also used all over the place in specifications, web logs, the windows character map, etc. Contrary to that, the JS syntax is not familiar to me at all, I never use the JS-\u-notation, I encode the files as Unicode and insert the actual characters, which makes the strings readable. Also what about characters outside the 16-bit planes? You say escaped *UTF-16* character string. Does that mean for the U+10000 character, the resulting string will instead be \uD800\uDC00? In this case, I definitely think the latter is much more complicated. And of course there is also the matter that here you are changing an existing property which is referenced in a REC specification (SVN Tiny 1.2). Because you do not really break backwards compatibility as you change the property name as well, this could be acceptable, however I can see no arguments for making such a change. To be clear, I do not consider “based on content author and implementer feedback” to be a well-founded argumentation to justify such a change :). I hope here I *did* provide some concrete reasons why this change is *not* a good idea. Especially given the confusing double-backslash-escaping described above, I do not think the current change was well thought-through by those who gave the feedback. If you would really want to make things easier, you could not return a key identifier but the character itself so you can compare it directly. E.g.: if (event.key == "\u00A9") or if (event.key == "©") ~Laurens Op 30-10-2009 8:32, Doug Schepers schreef: > Hi, Folks- > > (BCC to potentially affected groups: w3c-html-cg, public-webapps, > public-i18n-core, wai-xtech, www-svg, public-forms, public-xhtml2, > public-html@..., www-voice... please forward on to any relevant > groups or individuals I may have missed, especially outside W3C.) > > As editor of the DOM3 Events specification, I made what some may > consider to be drastic changes in the most recent drafts: > * I changed the syntax of the key identifier strings from "U+xxxx" (a > plain string representing the Unicode code point) to "\uxxxx" (an > escaped UTF-16 character string), based on content author and > implementer feedback. > * I renamed the "key identifier(s)" feature to "key value(s)". > > I've mentioned these ideas before in DOM3 Events telcons, and finally > decided to do it, after first consulting with the I18n WG, who > generally approved of the scheme (though not without some comments > about details that will need to be addressed and resolved). > > The new string format should be easier to deal with for developers, > and the new name reflects some confusion I've encountered when > explaining what "key identifiers" are... the work "identifier" seems > to evoke the concept of a unique identifier for a key, when in fact > what the feature does is provides the most appropriate value given the > state of keyboard modifiers and modes. I have tried also to clarify > this in the prose of the spec. > > We are aware that there may already be implementations and > specifications that rely on the previous string format and name (as > well as links), back from when this was a W3C Note, and we do not make > this decision lightly, but we do believe this is the right decision > for a stable and internationalized keyboard interface going forward. > For those implementations and specifications that need the previous > functionality and name, you may be able to reference the SVG Tiny 1.2 > specification [2] instead, which does include the old Key Identifiers > feature more or less intact from the previous definition, and is a > stable W3C Recommendation. > > You can review the changes in the most recent Editor's Draft [1]. The > WebApps WG welcomes your feedback to the www-dom@... list. This > specification is still a work in progress, though we do hope to go to > Last Call soon, so we are open to suggestions. (Note that the spec is > mostly feature-complete, so new event types and other changes may have > to wait for the next version, but send them on anyway.) > > [1] > http://dev.w3.org/2006/webapi/DOM-Level-3-Events/html/DOM3-Events.html#keyset > > [2] http://www.w3.org/TR/SVGTiny12/svgudom.html#KeyIdentifiersSet > > > Regards- > -Doug Schepers, on behalf of the WebApps WG > Editor, DOM Level 3 Events > W3C Team Contact, SVG and WebApps WGs > -- ~~ Ushiko-san! Kimi wa doushite, Ushiko-san nan da!! ~~ Laurens Holst, developer, Utrecht, the Netherlands Website: www.grauw.nl. Backbase employee; www.backbase.com |
|
|
Re: Changes to DOM3 Events Key IdentifiersOp 30-10-2009 10:32, Maciej Stachowiak schreef:
> "\uxxxx" is not a syntax, it is a Unicode string of the actual > character. \u introduces the escape sequence for a unicode code point. > So you can compare it directly to a character. Now I’m confused. The way Doug phrased it, \uxxxx *will* be syntax, i.e. the string "U+xxxx" will be replaced by "\\uxxxx" (a 6-character string containing an identifier). Not "\uxxxx" (a 1-character string containing the actual character) which could be compared directly to a character. Otherwise, I would suggest not to talk in terms of "\uxxxx" strings at all, after all the DOM specification does not need to concern itself with serialisation, but instead to just talk about characters and code points. ~Laurens -- ~~ Ushiko-san! Kimi wa doushite, Ushiko-san nan da!! ~~ Laurens Holst, developer, Utrecht, the Netherlands Website: www.grauw.nl. Backbase employee; www.backbase.com |
|
|
Re: Changes to DOM3 Events Key IdentifiersOp 30-10-2009 11:36, Alex Danilo schreef:
>> "\uxxxx" is not a syntax, it is a Unicode string of the actual >> character. \u introduces the escape sequence for a unicode code point. >> So you can compare it directly to a character. >> > Thanks for the clarification. > Well, regardless of whether what Maciej says is actually the case, > So, how does this provide any advantage over 'keyCode'? > keyCode returns an integer, which in JavaScript is not directly comparable to a character (well, it is, but not like in C; 101 == "101"). JavaScript does not have a character type, only a string type. Actually I think C does not have a character type either, it is an alias for byte, no? Either way, in JavaScript 169 != "©". > It seemed to me that keyCode is used for the code point, as in maps > to the Unicode point and the whole reason there was keyIdentifier > was to provide descriptive strings. > keyCode does not map to Unicode code points, e.g. F1-F24 map to values 112-135 which are not Unicode. > If I want the Unicode point explicitly I can use > keyCode or am I missing something? > Hope this cleared that up. ~Laurens -- ~~ Ushiko-san! Kimi wa doushite, Ushiko-san nan da!! ~~ Laurens Holst, developer, Utrecht, the Netherlands Website: www.grauw.nl. Backbase employee; www.backbase.com |
|
|
Re: Changes to DOM3 Events Key Identifiers
Hi Laurens,
--Original Message--: >Op 30-10-2009 11:36, Alex Danilo schreef: >>> "\uxxxx" is not a syntax, it is a Unicode string of the actual >>> character. \u introduces the escape sequence for a unicode code point. >>> So you can compare it directly to a character. >>> >> Thanks for the clarification. >> > >Well, regardless of whether what Maciej says is actually the case, > >> So, how does this provide any advantage over 'keyCode'? >> > >keyCode returns an integer, which in JavaScript is not directly >comparable to a character (well, it is, but not like in C; 101 == >"101"). JavaScript does not have a character type, only a string type. >Actually I think C does not have a character type either, it is an alias >for byte, no? Either way, in JavaScript 169 != "©". > >> It seemed to me that keyCode is used for the code point, as in maps >> to the Unicode point and the whole reason there was keyIdentifier >> was to provide descriptive strings. >> > >keyCode does not map to Unicode code points, e.g. F1-F24 map to values >112-135 which are not Unicode. > >> If I want the Unicode point explicitly I can use >> keyCode or am I missing something? >> > >Hope this cleared that up. Yes, thanks it did, much appreciated. Alex >~Laurens > >-- >~~ Ushiko-san! Kimi wa doushite, Ushiko-san nan da!! ~~ >Laurens Holst, developer, Utrecht, the Netherlands >Website: www.grauw.nl. Backbase employee; www.backbase.com > > > > |
|
|
Re: Changes to DOM3 Events Key IdentifiersI want to point out that Unicode code points can go up to hex 10FFFF. The standard for \u is exactly 4 digits, so that one can intermix with characters and know where it terminates. There are a couple of schemes that are used to extend this to up to 6 digits, and still know where to terminate.
\UXXXXXXXX - C++, ICU \UXXXXXX - C# \u{xxxxxx} - Ruby There needs to be some mechanism for extending to 6 digits. It would be best to use one of the above rather than a new one. (My personal favorite is Ruby's.) Mark On Fri, Oct 30, 2009 at 00:32, Doug Schepers <schepers@...> wrote: Hi, Folks- |
|
|
Re: Changes to DOM3 Events Key IdentifiersHi, Mark-
Mark Davis ☕ wrote (on 10/30/09 12:22 PM): > I want to point out that Unicode code points can go up to hex 10FFFF. > The standard for \u is exactly 4 digits, so that one can intermix with > characters and know where it terminates. There are a couple of schemes > that are used to extend this to up to 6 digits, and still know where to > terminate. > > \UXXXXXXXX - C++, ICU > \UXXXXXX - C# > \u{xxxxxx} - Ruby > > There needs to be some mechanism for extending to 6 digits. It would be > best to use one of the above rather than a new one. (My personal > favorite is Ruby's.) The reason the "\u" escaped character sequence was chosen was that it is the native ECMAScript escape notation, which is easy for browser-based applications to use directly (i.e. they can inject it directly into the markup as a character). But, yes, this does have the cap of 4 digits, and I personally would prefer to use a different escape mechanism... but only if one or both of these 2 conditions obtains: 1) DOM3 Events implementations also update their Javascript engines to be able to process the additional escape sequence (e.g. one of the ones you mention above) in the same way they process the "\u" escape sequence. This is the better long-term solution, and I'd hope ECMA TC39 could be persuaded to add this to future ECMAScript specs. 2) Script authors could use a normalizing method (c.f. convertKeyValue) to "dumb down" the 6-digit escape sequence into the 4-digit format (by converting to surrogate pairs when necessary). Javascript is becoming increasingly important, and so is the need for internationalized and localized language support. With the new font-linking enablers (including my favorite, WOFF [1]), and i18n domain extension policy [2], we're going to see more use of languages I have no chance of ever understanding, and I want DOM3 Events and ECMAScript to be part of that. I'd rather not introduce a not-very-good solution (UTF-16) that we know would not meet all the needs of the world community, just because of a (temporary?) circumstance with a vagary of Javascript. But, I also want this spec interoperably implemented... so, any solution needs the buy-in of the implementers. Any arguments on either side of the coin would help make a more informed decision. BTW, you stated a preference for the Ruby-style delimited escaped characters... could you say why you prefer that? [1] http://people.mozilla.com/~jkew/woff/woff-2009-09-16.html [2] http://www.icann.org/en/announcements/announcement-30oct09-en.htm Regards- -Doug Schepers W3C Team Contact, SVG and WebApps WGs |
|
|
Re: Changes to DOM3 Events Key IdentifiersIf the target of this is JavaScript, then the alternative (which Java has also chosen) is to use the UTF16 representation, wherein a pair of \u characters represents each supplementary character (above FFFF). It just needs to be carefully documented.
Mark On Fri, Oct 30, 2009 at 11:38, Doug Schepers <schepers@...> wrote: Hi, Mark- |
|
|
Re: Changes to DOM3 Events Key IdentifiersHi, Laurens-
Laurens Holst wrote (on 10/30/09 7:02 AM): > Op 30-10-2009 10:32, Maciej Stachowiak schreef: >> "\uxxxx" is not a syntax, it is a Unicode string of the actual >> character. \u introduces the escape sequence for a unicode code point. >> So you can compare it directly to a character. > > Now I’m confused. The way Doug phrased it, \uxxxx *will* be syntax, i.e. > the string "U+xxxx" will be replaced by "\\uxxxx" (a 6-character string > containing an identifier). Not "\uxxxx" (a 1-character string containing > the actual character) which could be compared directly to a character. > > Otherwise, I would suggest not to talk in terms of "\uxxxx" strings at > all, after all the DOM specification does not need to concern itself > with serialisation, but instead to just talk about characters and code > points. Just to clarify, are you objecting to the loose way I phrased it in my email, or did you review the spec and find problems there? I may have used the wrong terminology in the email, but the spec is the definitive source that needs to get it right. So, please clarify if you object to the change described in the spec. Regards- -Doug Schepers W3C Team Contact, SVG and WebApps WGs |
|
|
Re: Changes to DOM3 Events Key IdentifiersDoug Schepers scripsit:
> 1) DOM3 Events implementations also update their Javascript engines to > be able to process the additional escape sequence (e.g. one of the ones > you mention above) in the same way they process the "\u" escape > sequence. This is the better long-term solution, and I'd hope ECMA TC39 > could be persuaded to add this to future ECMAScript specs. I doubt it, given that such escapes are usually programmatically generated. In any case, ECMAScript is firmly committed to a 16-bit character model. -- A rabbi whose congregation doesn't want John Cowan to drive him out of town isn't a rabbi, http://www.ccil.org/~cowan and a rabbi who lets them do it cowan@... isn't a man. --Jewish saying |
|
|
Re: Changes to DOM3 Events Key IdentifiersJava is committed to 16-bit code units as well, but a relatively small number of additions enabled effective handling of UTF-16 text.
Mark On Fri, Oct 30, 2009 at 13:42, John Cowan <cowan@...> wrote: Doug Schepers scripsit: |
|
|
Re: Changes to DOM3 Events Key IdentifiersMark Davis �?? scripsit:
> Java is committed to 16-bit code units as well, but a relatively small > number of additions enabled effective handling of UTF-16 text. Good luck getting any such additions into JavaScript, except as a library. -- They tried to pierce your heart John Cowan with a Morgul-knife that remains in the http://www.ccil.org/~cowan wound. If they had succeeded, you would become a wraith under the domination of the Dark Lord. --Gandalf |
|
|
RE: Changes to DOM3 Events Key Identifiers> Doug Schepers scripsit:
> > > 1) DOM3 Events implementations also update their Javascript > engines to > > be able to process the additional escape sequence (e.g. one of > the ones > > you mention above) in the same way they process the "\u" escape > > sequence. This is the better long-term solution, and I'd hope > ECMA TC39 > > could be persuaded to add this to future ECMAScript specs. > > I doubt it, given that such escapes are usually programmatically > generated. > In any case, ECMAScript is firmly committed to a 16-bit character > model. > ECMAScript's "firm commitment" to a 16-bit character model (i.e. UTF-16) is not the problem. Lack of support for supplementary characters (that is, those above 0xFFFF in Unicode), however, is a very real problem. No UTF-16 process can escape the fact that, even if one applies a short-sighted limit to BMP characters, a character may require more than one code point to encode. As long as it is clear that DOM3 Events key identifiers are a string containing possibly more than one code point (and potentially more than one character), the escaping syntax is just a detail of the language. Addison |
|
|
Re: Changes to DOM3 Events Key IdentifiersPhillips, Addison scripsit:
> ECMAScript's "firm commitment" to a 16-bit character model (i.e. UTF-16) If only. JavaScript and JSON strings aren't sequences of characters, they are sequences of 16-bit unsigned integers. If you happen to want to interpret them as UTF-16, you are free to do so, but there is not and never will be any guarantee that all strings are well-formed UTF-16. What's more, the built-in JSON serializer provided by ECMAScript 5th edition does not generate escape sequences for isolated surrogate codepoints, so that some strings will be written out in CESU-8 rather than UTF-8. Worse yet, the JSON RFC is self-contradictory, with the result that it's not even clear that CESU-8-encoded JSON is illegal. -- Let's face it: software is crap. Feature-laden and bloated, written under tremendous time-pressure, often by incapable coders, using dangerous languages and inadequate tools, trying to connect to heaps of broken or obsolete protocols, implemented equally insufficiently, running on unpredictable hardware -- we are all more than used to brokenness. --Felix Winkelmann |
|
|
Re: Changes to DOM3 Events Key Identifiers> If you happen to want to interpret
them as UTF-16, you are free to do so, but there is not and never will be any guarantee that all strings are well-formed UTF-16. You never have that guarantee, any more than you have the guarantee that a source purporting to be UTF-8 is in fact well formed. All conscientious recipients need to check the data -- if they are sensitive to ill-formed text. Luckily, the impact of ill-formed UTF-16 is vastly less than that of ill-formed UTF-8. Mark On Fri, Oct 30, 2009 at 17:47, John Cowan <cowan@...> wrote: Phillips, Addison scripsit: |
| < Prev | 1 - 2 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |