|
View:
New views
10 Messages
—
Rating Filter:
Alert me
|
|
|
New Working Group Note: Requirements for String Identity Matching and String IndexingOn 15th September, the Internationalization Core Working Group published Requirements for String Identity Matching and String Indexing as a Working Group Note.
http://www.w3.org/TR/charreq/ This document was published as a Working Group note in order to capture and preserve historical information. It contains requirements elaborated in 1998 for aspects of the character model for W3C specifications. It was developed and extensively reviewed by the Internationalization Working Group, but never progressed beyond Working Draft status. For this publication, the wording of the 1998 version remains unchanged (except for correction of a small number of typographic errors), but the links to references have been updated prior to this publication. The document describes requirements for some important aspects of the character model for W3C specifications. The two aspects discussed are string identity matching and string indexing. Editor: Martin Dürst. |
|
|
RE: New Working Group Note: Requirements for String Identity Matching and String IndexingMy initial comments on: "Requirements for String Identity Matching and String Indexing" http://www.w3.org/TR/charreq/ are on proofreading! 2.3 PAR 2, last sentence "A process shall not assume that the interpretations of two canonical-equivalent character sequences are distinct. Additions may include some presentation forms." {CORRECTION: "canonical-equivalent" >= "canonically-equivalent" See text at: http://en.wikipedia.org/wiki/Unicode_equivalence for an example of the use of "canonically-equivalent"} * * * 2.10, PAR 2, first bullet "It is a prerequisite for be conservative in what you send " { CORRECTION >= "It is prerequisite to being conservative in what you send." Alternately, >= "It is prerequisite to one's being in what is sent." } * * * 3.2, PAR 1, last sentence "As an example, it could be required that text transmitted via certain protocols, or text exposed in certain APIs, is normalized." {COMMENTS: ?? You used the indicative ("is normalized"), and not the subjunctive, which may be o.k. in the U.K. but in the U.S. the correct grammar is "is normalized" >= ?? "be normalized." Also I would like some examples of the protocols here! } * * * 3.2, last PAR, last sentence "Such a transfer is indeed highly desirable in many cases, because to avoid generating unnormalized data is in many cases easier than to normalize such data later." {CORRECTION/COMMENT: broken verb predicate (I think it's better to keep these together when you can): >="Such a transfer is indeed highly desirable in many cases, because it is in many cases easier to avoid generating unnormalized data than it is to normalize such data later." } * * * 4.4 { COMMENT/CORRECTION?? : I think I'd prefer >= "sub-elements" and >= "sub-element" [that is, I think this word needs a hyphen--but some people don't hyphenate--IBM, for example; see: http://www.google.com/search?hl=fr&source=hp&q=sub-element&btnG=Recherche+Google&lr=&aq=f&oq=!] } * * * I'll follow with a few questions/comments on the contents shortly! Best, C. E. Whitehead cewcathar@... > From: ishida@... > To: www-international@... > Date: Thu, 1 Oct 2009 15:38:40 +0100 > Subject: New Working Group Note: Requirements for String Identity Matching and String Indexing > > On 15th September, the Internationalization Core Working Group published Requirements for String Identity Matching and String Indexing as a Working Group Note. > > http://www.w3.org/TR/charreq/ > > This document was published as a Working Group note in order to capture and preserve historical information. It contains requirements elaborated in 1998 for aspects of the character model for W3C specifications. It was developed and extensively reviewed by the Internationalization Working Group, but never progressed beyond Working Draft status. For this publication, the wording of the 1998 version remains unchanged (except for correction of a small number of typographic errors), but the links to references have been updated prior to this publication. > > The document describes requirements for some important aspects of the character model for W3C specifications. The two aspects discussed are string identity matching and string indexing. > > Editor: Martin Dürst. > > |
|
|
RE: New Working Group Note: Requirements for String Identity Matching and String IndexingI have one more proofreading comment for: "Requirements for String Identity Matching and String Indexing" (W3C Working Group Note 15 September 2009). 3.3; Sentence 3 "It may also provide a bit more time, in that we are just defining what might happen naturally anyway instead of having to fight uphill from day one." { COMMENT: wordy: >= "By doing so we are defining what might happen naturally anyway . . ." } Best, --C. E. Whitehead cewcathar@... * * * From: cewcathar@... To: ishida@...; www-international@... Date: Mon, 5 Oct 2009 15:37:22 -0400 Subject: RE: New Working Group Note: Requirements for String Identity Matching and String Indexing Hi! My initial comments on: "Requirements for String Identity Matching and String Indexing" http://www.w3.org/TR/charreq/ are on proofreading! 2.3 PAR 2, last sentence "A process shall not assume that the interpretations of two canonical-equivalent character sequences are distinct. Additions may include some presentation forms." {CORRECTION: "canonical-equivalent" >= "canonically-equivalent" See text at: http://en.wikipedia.org/wiki/Unicode_equivalence for an example of the use of "canonically-equivalent"} * * * 2.10, PAR 2, first bullet "It is a prerequisite for be conservative in what you send " { CORRECTION >= "It is prerequisite to being conservative in what you send." Alternately, >= "It is prerequisite to one's being in what is sent." } * * * 3.2, PAR 1, last sentence "As an example, it could be required that text transmitted via certain protocols, or text exposed in certain APIs, is normalized." {COMMENTS: ?? You used the indicative ("is normalized"), and not the subjunctive, which may be o.k. in the U.K. but in the U.S. the correct grammar is "is normalized" >= ?? "be normalized." Also I would like some examples of the protocols here! } * * * 3.2, last PAR, last sentence "Such a transfer is indeed highly desirable in many cases, because to avoid generating unnormalized data is in many cases easier than to normalize such data later." {CORRECTION/COMMENT: broken verb predicate (I think it's better to keep these together when you can): >="Such a transfer is indeed highly desirable in many cases, because it is in many cases easier to avoid generating unnormalized data than it is to normalize such data later." } * * * 4.4 { COMMENT/CORRECTION?? : I think I'd prefer >= "sub-elements" and >= "sub-element" [that is, I think this word needs a hyphen--but some people don't hyphenate--IBM, for example; see: http://www.google.com/search?hl=fr&source=hp&q=sub-element&btnG=Recherche+Google&lr=&aq=f&oq=!] } * * * I'll follow with a few questions/comments on the contents shortly! Best, C. E. Whitehead cewcathar@... > From: ishida@... > To: www-international@... > Date: Thu, 1 Oct 2009 15:38:40 +0100 > Subject: New Working Group Note: Requirements for String Identity Matching and String Indexing > > On 15th September, the Internationalization Core Working Group published Requirements for String Identity Matching and String Indexing as a Working Group Note. > > http://www.w3.org/TR/charreq/ > > This document was published as a Working Group note in order to capture and preserve historical information. It contains requirements elaborated in 1998 for aspects of the character model for W3C specifications. It was developed and extensively reviewed by the Internationalization Working Group, but never progressed beyond Working Draft status. For this publication, the wording of the 1998 version remains unchanged (except for correction of a small number of typographic errors), but the links to references have been updated prior to this publication. > > The document describes requirements for some important aspects of the character model for W3C specifications. The two aspects discussed are string identity matching and string indexing. > > Editor: Martin Dürst. > > |
|
|
RE: New Working Group Note: Requirements for String Identity Matching and String Indexing"Requirements for String Identity Matching and String Indexing" (W3C Working Group Note 15 September 2009). are on the content (but since this document is being published 'for historical reasons' I don't know if these will be helpful). * * * 2.4; PAR 2 "These differences can be handled by the (mainly native) users of the characters in question, and can at least be identified by users not familiar with the characters in question. Such similarities are explicitly not considered for string identity matching, because they do not need a coordinated solution for the entirety of the WWW." {COMMENT: All three differences?? Lower-case upper-case (or connected beginning, connected end/middle, unconnected in Arabic) and diacritics?? I think these require a coordinated www solution especially in the case of IRI'S. When I search and have no way to type in diacritics, I prefer that letters with or without diacritics be treated as the same; same for upper and lower case; this is great for searching so solutions may vary but policy about these with respect to the internationalization of URI's everything should be covered carefully by a universal WWW policy--perhaps the "clear character" model mentioned in section 4.7 may solve this problem?? I'm not sure. * * * 4.1; Par 2 "Note: In many cases, it is highly preferable to use non-numeric ways of identifying substrings. The specification of string indexing for the WWW should not be seen as a general recommendation for the use of string indexing for substring identification. As an example, in the case of translation of a document from one language to another, identification of substrings based on document structure can be expected to be much more stable than identification based on string indexing." I suppose there is already a w3c recommendation for document structure; I think a link to this would be helpful here??? * * * Best, C. E. Whitehead cewcathar@... From: cewcathar@... To: ishida@...; www-international@... Date: Tue, 6 Oct 2009 20:26:08 -0400 Subject: RE: New Working Group Note: Requirements for String Identity Matching and String Indexing I have one more proofreading comment for: "Requirements for String Identity Matching and String Indexing" (W3C Working Group Note 15 September 2009). 3.3; Sentence 3 "It may also provide a bit more time, in that we are just defining what might happen naturally anyway instead of having to fight uphill from day one." { COMMENT: wordy: >= "By doing so we are defining what might happen naturally anyway . . ." } Best, --C. E. Whitehead cewcathar@... * * * From: cewcathar@... To: ishida@...; www-international@... Date: Mon, 5 Oct 2009 15:37:22 -0400 Subject: RE: New Working Group Note: Requirements for String Identity Matching and String Indexing Hi! My initial comments on: "Requirements for String Identity Matching and String Indexing" http://www.w3.org/TR/charreq/ are on proofreading! 2.3 PAR 2, last sentence "A process shall not assume that the interpretations of two canonical-equivalent character sequences are distinct. Additions may include some presentation forms." {CORRECTION: "canonical-equivalent" >= "canonically-equivalent" See text at: http://en.wikipedia.org/wiki/Unicode_equivalence for an example of the use of "canonically-equivalent"} * * * 2.10, PAR 2, first bullet "It is a prerequisite for be conservative in what you send " { CORRECTION >= "It is prerequisite to being conservative in what you send." Alternately, >= "It is prerequisite to one's being in what is sent." } * * * 3.2, PAR 1, last sentence "As an example, it could be required that text transmitted via certain protocols, or text exposed in certain APIs, is normalized." {COMMENTS: ?? You used the indicative ("is normalized"), and not the subjunctive, which may be o.k. in the U.K. but in the U.S. the correct grammar is "is normalized" >= ?? "be normalized." Also I would like some examples of the protocols here! } * * * 3.2, last PAR, last sentence "Such a transfer is indeed highly desirable in many cases, because to avoid generating unnormalized data is in many cases easier than to normalize such data later." {CORRECTION/COMMENT: broken verb predicate (I think it's better to keep these together when you can): >="Such a transfer is indeed highly desirable in many cases, because it is in many cases easier to avoid generating unnormalized data than it is to normalize such data later." } * * * 4.4 { COMMENT/CORRECTION?? : I think I'd prefer >= "sub-elements" and >= "sub-element" [that is, I think this word needs a hyphen--but some people don't hyphenate--IBM, for example; see: http://www.google.com/search?hl=fr&source=hp&q=sub-element&btnG=Recherche+Google&lr=&aq=f&oq=!] } * * * I'll follow with a few questions/comments on the contents shortly! Best, C. E. Whitehead cewcathar@... > From: ishida@... > To: www-international@... > Date: Thu, 1 Oct 2009 15:38:40 +0100 > Subject: New Working Group Note: Requirements for String Identity Matching and String Indexing > > On 15th September, the Internationalization Core Working Group published Requirements for String Identity Matching and String Indexing as a Working Group Note. > > http://www.w3.org/TR/charreq/ > > This document was published as a Working Group note in order to capture and preserve historical information. It contains requirements elaborated in 1998 for aspects of the character model for W3C specifications. It was developed and extensively reviewed by the Internationalization Working Group, but never progressed beyond Working Draft status. For this publication, the wording of the 1998 version remains unchanged (except for correction of a small number of typographic errors), but the links to references have been updated prior to this publication. > > The document describes requirements for some important aspects of the character model for W3C specifications. The two aspects discussed are string identity matching and string indexing. > > Editor: Martin Dürst. > > |
|
|
Re: New Working Group Note: Requirements for String Identity Matching and String Indexing* Richard Ishida wrote:
>http://www.w3.org/TR/charreq/ Two things are identical if you cannot tell them apart. Two things are merely equivalent in some context if the differences between them are of no concern in that context. The document confuses these terms as should be apparent from awkward phrases like "The string identity matching specification shall not treat as equivalent"; clearly a specification defining *identity* would treat things as *identical*, not as /equiva- lent/. The definition "Two strings match as identical if they contain no user- identifiable distinctions" is inherently incorrect. If there is any difference at all between two strings then users can necessarily identi- fy them. I find this terminological confusion harmful and would ask the Working Group to either change or withdraw the document. -- Björn Höhrmann · mailto:bjoern@... · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ |
|
|
RE: New Working Group Note: Requirements for String Identity Matching and String IndexingHi, Thank you for your comments. However, two points: 1.
This document is now
published as a WG Note. We shan’t be making any changes to it. 2.
This document was
published as a WG Note strictly for historical reasons. It formed the basis for
the CharMod work but was never formally published as a WG Note. It remained as
a Working Draft lo these many years. Because this document is an important milestone,
in its way, we felt that we should give it Note status rather than junking it. Regards, Addison Addison Phillips Globalization Architect -- Lab126 Chair -- W3C Internationalization WG Internationalization is not a feature. It is an architecture. From:
www-international-request@... [mailto:www-international-request@...] On
Behalf Of CE Whitehead My
remaining comments on From:
cewcathar@... From: cewcathar@... |
|
|
Re: New Working Group Note: Requirements for String Identity Matching and String IndexingBjoern Hoehrmann scripsit:
> Two things are identical if you cannot tell them apart. Two things are > merely equivalent in some context if the differences between them are > of no concern in that context. This distinction is not effective in programming. If we took your definition of identity at face value, we'd say that after: char *x = "abc"; char *y = strdup(x) then x and y are not identical strings, because y can be mutated to differ from x, and that is an observable distinction between them. In which case there is no use talking of string identity at all, for there is none. > awkward phrases like "The string identity matching specification shall > not treat as equivalent"; clearly a specification defining *identity* > would treat things as *identical*, not as /equivalent/. It's best to take "string identity" as a term of art in this Note. -- Business before pleasure, if not too bloomering long before. --Nicholas van Rijn John Cowan <cowan@...> http://www.ccil.org/~cowan |
|
|
Re: New Working Group Note: Requirements for String Identity Matching and String Indexing* John Cowan wrote:
>Bjoern Hoehrmann scripsit: > >> Two things are identical if you cannot tell them apart. Two things are >> merely equivalent in some context if the differences between them are >> of no concern in that context. > >This distinction is not effective in programming. It is one of the main characteristics of "programming" that you, as the programmer, get to decide what distinctions you want to make. I do not see how this is relevant here. In your example: > char *x = "abc"; > char *y = strdup(x) You have different "objects"; those objects are not strings, they are on a higher level. Having different objects but identical strings is not a contradiction. -- Björn Höhrmann · mailto:bjoern@... · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ |
|
|
RE: New Working Group Note: Requirements for String Identity Matching and String IndexingO.k.., oh well, I'll wait for the next working draft. --Best, C. E. Whitehead cewcathar@... From: addison@... To: cewcathar@...; ishida@...; www-international@... Date: Tue, 6 Oct 2009 23:14:27 -0400 Subject: RE: New Working Group Note: Requirements for String Identity Matching and String Indexing Hi,
Thank you for your comments. However, two points:
1. This document is now published as a WG Note. We shan’t be making any changes to it. 2. This document was published as a WG Note strictly for historical reasons. It formed the basis for the CharMod work but was never formally published as a WG Note. It remained as a Working Draft lo these many years. Because this document is an important milestone, in its way, we felt that we should give it Note status rather than junking it.
Regards,
Addison
Addison Phillips Globalization Architect -- Lab126 Chair -- W3C Internationalization WG
Internationalization is not a feature. It is an architecture.
From: www-international-request@... [mailto:www-international-request@...] On Behalf Of CE Whitehead
My remaining comments on From: cewcathar@... From: cewcathar@... |
|
|
Re: New Working Group Note: Requirements for String Identity Matching and String IndexingHello Björn,
Many thanks for your comments. On 2009/10/07 10:48, Bjoern Hoehrmann wrote: > * Richard Ishida wrote: >> http://www.w3.org/TR/charreq/ > > Two things are identical if you cannot tell them apart. Two things are > merely equivalent in some context if the differences between them are of > no concern in that context. The document confuses these terms as should > be apparent from awkward phrases like "The string identity matching > specification shall not treat as equivalent"; clearly a specification > defining *identity* would treat things as *identical*, not as /equiva- > lent/. > > The definition "Two strings match as identical if they contain no user- > identifiable distinctions" is inherently incorrect. If there is any > difference at all between two strings then users can necessarily identi- > fy them. I find this terminological confusion harmful and would ask the > Working Group to either change or withdraw the document. The document has been around as a Working Draft for over 10 years. Nobody saw that problem, or thought that it was serious enough to comment (let alone to withdraw the document). In addition to what others have said, I'd also note that at a requirements stage, the main question is "what do we need", not "how are we best going to call it". Regards, Martin. -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@... |
| Free embeddable forum powered by Nabble | Forum Help |