Re: what's the language of a document ?

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

Parent Message unknown Re: what's the language of a document ?

by Martin Kliehm-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 25.10.2009, at 09:38, Divya Manian <divya.manian@...> wrote:

> Internationalization best practices [1] states:
>
> “Where a document contains content aimed at speakers of more than o
> ne
> language, use Content-Language with a comma-separated list of  
> language tags.”
>
> The HTML 5 specs [2] state:
>
> “…there is a document-wide default language set, then that is the  
> language of the node.
>
> If there is no document-wide default language, then language  
> information
> from a higher-level protocol (such as HTTP), if any, must be used as  
> the
> final fallback language. In the absence of any language information,  
> the default value is unknown (the empty string).”
>
> What is not clear is, what happens if a HTML document has a HTTP  
> header Content-Language has a comma-separated list of language tags  
> and no other language declarations? I found on a thread [3] that  
> states such a document will be declared to use "unknown" language in  
> this case. It would be good to
> have this case explicitly stated.

Also in XHTML notation empty strings are disallowed, so the default  
valuefor "unknown" would be in that case "und". [4]

Cheers,
Martin

[4] http://www.w3.org/International/questions/qa-no-language

Re: what's the language of a document ?

by John Cowan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Martin Kliehm scripsit:

> Also in XHTML notation empty strings are disallowed, so the default  
> valuefor "unknown" would be in that case "und". [4]

Why would empty strings be disallowed in xml:lang attributes?  I can
find no indication of that in XHTML 1.0.

--
Why are well-meaning Westerners so concerned that   John Cowan
the opening of a Colonel Sanders in Beijing means   cowan@...
the end of Chinese culture? [...]  We have had      http://www.ccil.org/~cowan
Chinese restaurants in America for over a century,
and it hasn't made us Chinese.  On the contrary,
we obliged the Chinese to invent chop suey.            --Marshall Sahlins


Re: what's the language of a document ?

by Gunnar Bittersmann-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

John Cowan scripsit:
> Why would empty strings be disallowed in xml:lang attributes?  I can
> find no indication of that in XHTML 1.0.

The DTD says (cf.
http://www.w3.org/TR/xhtml1/dtds.html#a_dtd_XHTML-1.0-Strict):
<!ENTITY % i18n
  "lang        %LanguageCode; #IMPLIED
   xml:lang    %LanguageCode; #IMPLIED
   dir         (ltr|rtl)      #IMPLIED"
   >

The entity 'LanguageCode' was former defined as:
<!ENTITY % LanguageCode "NMTOKEN">

NMTOKEN must not be an empty string (cf.
http://www.w3.org/TR/REC-xml/#dt-name).

See also
http://www.w3.org/International/questions/qa-no-language#undetermined

Cheers,
Gunnar




Re: what's the language of a document ?

by Ian Hickson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sun, 25 Oct 2009, Divya Manian wrote:

>
> Internationalization best practices [1] states:
>
> �Where a document contains content aimed at speakers of more than one
> language, use Content-Language with a comma-separated list of language
> tags.�
>
> The HTML 5 specs [2] state:
>
> ��there is a document-wide default language set, then that is the
> language of the node.
>
> If there is no document-wide default language, then language information
> from a higher-level protocol (such as HTTP), if any, must be used as the
> final fallback language. In the absence of any language information, the
> default value is unknown (the empty string).�
>
> What is not clear is, what happens if a HTML document has a HTTP header
> Content-Language has a comma-separated list of language tags and no other
> language declarations? I found on a thread [3] that states such a document
> will be declared to use "unknown" language in this case. It would be good to
> have this case explicitly stated.
I've updated the spec to say that when the higher-level protocol reports
multiple languages, they are all ignored in favour of the default
(unknown).


On Sun, 25 Oct 2009, Martin Kliehm wrote:
>
> Also in XHTML notation empty strings are disallowed, so the default
> value for "unknown" would be in that case "und". [4]

On Sun, 25 Oct 2009, John Cowan wrote:
>
> Why would empty strings be disallowed in xml:lang attributes?  I can
> find no indication of that in XHTML 1.0.

In HTML5, the "unknown" value is the empty string (for "lang"). The
xml:lang attribute is defined by the XML spec.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

RE: what's the language of a document ?

by tex-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ian,

So if someone attempts to be specific and declares content-language to be "es-mx,es-ar" for mexico and argentina,
or perhaps declares "en, en-us" then that information is thrown away in favor of unknown?

Also, does this change to the document default language impact just html behavior, or embedded scripting languages as well?

If there were code that checks for language and performs different actions based on languages in the document, that is affected as well?

I assume so.

Why does the default need to be monolingual?
tex

-----Original Message-----
From: www-international-request@... [mailto:www-international-request@...] On Behalf Of Ian Hickson
Sent: Monday, October 26, 2009 6:31 PM
To: Divya Manian; Martin Kliehm; John Cowan
Cc: <public-html@...>; www-international@...
Subject: Re: what's the language of a document ?

On Sun, 25 Oct 2009, Divya Manian wrote:

>
> Internationalization best practices [1] states:
>
> Where a document contains content aimed at speakers of more than one
> language, use Content-Language with a comma-separated list of language
> tags.
>
> The HTML 5 specs [2] state:
>
> there is a document-wide default language set, then that is the
> language of the node.
>
> If there is no document-wide default language, then language information
> from a higher-level protocol (such as HTTP), if any, must be used as the
> final fallback language. In the absence of any language information, the
> default value is unknown (the empty string).
>
> What is not clear is, what happens if a HTML document has a HTTP header
> Content-Language has a comma-separated list of language tags and no other
> language declarations? I found on a thread [3] that states such a document
> will be declared to use "unknown" language in this case. It would be good to
> have this case explicitly stated.

I've updated the spec to say that when the higher-level protocol reports
multiple languages, they are all ignored in favour of the default
(unknown).


On Sun, 25 Oct 2009, Martin Kliehm wrote:
>
> Also in XHTML notation empty strings are disallowed, so the default
> value for "unknown" would be in that case "und". [4]

On Sun, 25 Oct 2009, John Cowan wrote:
>
> Why would empty strings be disallowed in xml:lang attributes?  I can
> find no indication of that in XHTML 1.0.

In HTML5, the "unknown" value is the empty string (for "lang"). The
xml:lang attribute is defined by the XML spec.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'




RE: what's the language of a document ?

by Ian Hickson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, 26 Oct 2009, Tex Texin wrote:
>
> So if someone attempts to be specific and declares content-language to
> be "es-mx,es-ar" for mexico and argentina, or perhaps declares "en,
> en-us" then that information is thrown away in favor of unknown?

For the purposes of the CSS :lang() selector, conversion to RDF, and UA
built-in spelling checkers, yes. The information is still conveyed by the
HTTP headers, though, and can be used for whatever purposes the HTTP
headers are intended for.


> Also, does this change to the document default language impact just html
> behavior, or embedded scripting languages as well?

I don't understand what it would mean to affect embedded scripting
languages; can you elaborate?


> If there were code that checks for language and performs different
> actions based on languages in the document, that is affected as well?

That depends on how it checks for language.


> Why does the default need to be monolingual?

It's not that the default is monolingual, so much as the model used by
HTML has a single langauge per Element node. HTML itself supports multiple
languages, but not in the vague "there are multiple languages present"
sense, only at the specific per-element level. This is compatible with all
the systems I'm aware of except HTTP. For example, RDF only supports one
language per text literal, and spelling checkers generally expect a single
language per word.

In fact, based on what I've seen of the way the relevant HTTP headers are
used, I would personally recommend just changing the HTTP spec to only
allow one language there also, since few people use this to specify
multiple languages, and I'm not aware of any software that makes use of
this information.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: what's the language of a document ?

by Roy T. Fielding :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Oct 26, 2009, at 10:14 PM, Ian Hickson wrote:

> It's not that the default is monolingual, so much as the model used by
> HTML has a single langauge per Element node. HTML itself supports  
> multiple
> languages, but not in the vague "there are multiple languages present"
> sense, only at the specific per-element level. This is compatible  
> with all
> the systems I'm aware of except HTTP. For example, RDF only supports  
> one
> language per text literal, and spelling checkers generally expect a  
> single
> language per word.

How is that not compatible with HTTP?

> In fact, based on what I've seen of the way the relevant HTTP  
> headers are
> used, I would personally recommend just changing the HTTP spec to only
> allow one language there also, since few people use this to specify
> multiple languages, and I'm not aware of any software that makes use  
> of
> this information.

The HTTP headers refer to the entire representation.   If the
representation is intended to have an audience of multiple
languages, as is often the case when side-by-side translation
is desired or mandated, then the content should be labeled
appropriately.  That use case is often found in government
documents, poetry, lieder, language lessons, dictionaries, etc.

I would expect HTML content to be tagged as a single language,
if any, at some element level, whereas meta and link should
support multiple languages at the resource or representation
level.

....Roy


Re: what's the language of a document ?

by "Martin J. Dürst" :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 2009/10/27 14:14, Ian Hickson wrote:

> It's not that the default is monolingual, so much as the model used by
> HTML has a single langauge per Element node. HTML itself supports multiple
> languages, but not in the vague "there are multiple languages present"
> sense, only at the specific per-element level. This is compatible with all
> the systems I'm aware of except HTTP. For example, RDF only supports one
> language per text literal,

Yes. It is however also possible (not necessarily as easy as it should
have been) to use an XML literal in RDF, however, in which case the
whole literal can contain text from multiple languages.


Regards,   Martin.

--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@...


Re: what's the language of a document ?

by Ian Hickson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, 27 Oct 2009, "Martin J. Dürst" wrote:

> On 2009/10/27 14:14, Ian Hickson wrote:
> >
> > It's not that the default is monolingual, so much as the model used by
> > HTML has a single langauge per Element node. HTML itself supports
> > multiple languages, but not in the vague "there are multiple languages
> > present" sense, only at the specific per-element level. This is
> > compatible with all the systems I'm aware of except HTTP. For example,
> > RDF only supports one language per text literal,
>
> Yes. It is however also possible (not necessarily as easy as it should
> have been) to use an XML literal in RDF, however, in which case the
> whole literal can contain text from multiple languages.
Sure, but each character information item in the XML literal's infoset can
only have one resulting language, just like in HTML, so this isn't any
different and is similarly incompatible with what HTTP can express.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: what's the language of a document ?

by Simon Pieters-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, 27 Oct 2009 02:30:36 +0100, Ian Hickson <ian@...> wrote:

> On Sun, 25 Oct 2009, Divya Manian wrote:
>>
>> Internationalization best practices [1] states:
>>
>> �Where a document contains content aimed at speakers of more than one
>> language, use Content-Language with a comma-separated list of language
>> tags.�
>>
>> The HTML 5 specs [2] state:
>>
>> ��there is a document-wide default language set, then that is the
>> language of the node.
>>
>> If there is no document-wide default language, then language information
>> from a higher-level protocol (such as HTTP), if any, must be used as the
>> final fallback language. In the absence of any language information, the
>> default value is unknown (the empty string).�
>>
>> What is not clear is, what happens if a HTML document has a HTTP header
>> Content-Language has a comma-separated list of language tags and no  
>> other
>> language declarations? I found on a thread [3] that states such a  
>> document
>> will be declared to use "unknown" language in this case. It would be  
>> good to
>> have this case explicitly stated.
>
> I've updated the spec to say that when the higher-level protocol reports
> multiple languages, they are all ignored in favour of the default
> (unknown).

This doesn't match what's specced for <meta http-equiv=content-language  
content=foo,bar>. Maybe the <meta> should be aligned and say that when  
there's a comma, the element is ignored?

http://www.whatwg.org/specs/web-apps/current-work/multipage/semantics.html#attr-meta-http-equiv-content-language

--
Simon Pieters
Opera Software


Re: what's the language of a document ?

by Ian Hickson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, 27 Oct 2009, Simon Pieters wrote:
>
> This doesn't match what's specced for <meta http-equiv=content-language
> content=foo,bar>.

That's intentional, and is based on data about how people actually use
that pragma.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: what's the language of a document ?

by "Martin J. Dürst" :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 2009/10/27 19:37, Ian Hickson wrote:
> On Tue, 27 Oct 2009, Simon Pieters wrote:
>> This doesn't match what's specced for<meta http-equiv=content-language
>> content=foo,bar>.
>
> That's intentional, and is based on data about how people actually use
> that pragma.

There's always a way to justify inconsistent choices (be it browser
implementations, 'data' about how people (who?) use some feature (at
what point in time?),...). But it would be way better to be consistent.

And there is always a way to justify making choices that everybody
except those knowing all the details of the spec don't understand. But
it would be way better to make choices that are easy to understand (e.g.
http-equiv actually meaning what it says, namely "equivalent to the
corresponding HTTP header").

There are lots of cases where over time, people have come to a better
understanding of how things work. For stuff that authors/producers
aren't supposed to produce, I don't mind too much that HTML5 is
hopelessly complex and inconsistent. I can live without remembering it
all, and can tell others to avoid it. However, for stuff like the above,
which may be used even by very consciously clean developers, creating
inconsistencies such the above is a heavy negative legacy.

Regards,   Martin.

--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@...


RE: what's the language of a document ?

by Phillips, Addison :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Tex,

You have never been allowed to tag individual elements with more than one language tag. I think Hixie is saying is that, for any given span of text in the document, there can be exactly one language associated with it. When the language cannot be determined (perhaps due to conflicting information, such as a list), no language is applied.

In the Internationalization WG's tutorial on tagging language in HTML, the lang attribute is called the "document processing language". The outer-most element in an HTML document is <html> and the language declared on that element is the default for the document. This does not make HTML monolingual.

The Content-Language header (and associated META tag), by contrast, can be used to declare the intended audience of a document. Certainly a document can serve more than one audience and be in more than one language.

I don't think I agree that the "default" value (when that language is not declared) ought to be the tag 'und', but I don't think that's what Hixie is saying. There is a subtle difference between "the language of this document has not been determined" and making the tag actually be 'und'. I hope that the default value remains the empty tag, not something else.

>
> So if someone attempts to be specific and declares content-language
> to be "es-mx,es-ar" for mexico and argentina,
> or perhaps declares "en, en-us" then that information is thrown
> away in favor of unknown?

I would say "that information isn't artificially applied to specific elements in the document".

>
> Also, does this change to the document default language impact just
> html behavior, or embedded scripting languages as well?

Actually, it's not a change: this has always been true.

Embedded scripting languages aren't "in a language" from the point of view of HTML. When they access the DOM tree, they can access the language tagging hierarchy like any other DOM processor (although this isn't always convenient).

>
> If there were code that checks for language and performs different
> actions based on languages in the document, that is affected as
> well?

Code where?

Presumably code processing a document would process spans of text, not the entire document all at once.

>
> Why does the default need to be monolingual?

Because that's how xml:lang and lang work. Besides, they aren't "monolingual" per-se. They are "one language tag for a given context", with nesting. If one wishes to mark up say a French word in an English sentence, use a <span> (or other element) to do it:

   <p lang="en">In no sense is this sentence in two languages, even thought it
                contains the word <q lang="fr">raclette</q></p>.

Addison

Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.





RE: what's the language of a document ?

by Phillips, Addison :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

One small additional reference:

   http://www.w3.org/International/tutorials/language-decl/

I think this illustrates the current state of affairs pretty well and I don't think there is any reason to change it, except maybe to clarify it a bit. Personally I think that if <html> doesn't have either a lang or an xml:lang attribute, the default value should continue to be the empty tag.

Addison

Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.


Re: what's the language of a document ?

by John Cowan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ian Hickson scripsit:

> In fact, based on what I've seen of the way the relevant HTTP headers are
> used, I would personally recommend just changing the HTTP spec to only
> allow one language there also, since few people use this to specify
> multiple languages, and I'm not aware of any software that makes use of
> this information.

HTTP is not specific to HTML, and there is every reason why it should have a
broader model than HTML.

--
John Cowan
        cowan@...
                I am a member of a civilization. --David Brin


Re: what's the language of a document ?

by Ian Hickson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, 27 Oct 2009, "Martin J. Dürst" wrote:

> On 2009/10/27 19:37, Ian Hickson wrote:
> > On Tue, 27 Oct 2009, Simon Pieters wrote:
> > > This doesn't match what's specced for<meta
> > > http-equiv=content-language content=foo,bar>.
> >
> > That's intentional, and is based on data about how people actually use
> > that pragma.
>
> There's always a way to justify inconsistent choices (be it browser
> implementations, 'data' about how people (who?) use some feature (at
> what point in time?),...). But it would be way better to be consistent.
Sure, but it's even better to be in line with how authors are actually
using the feature. A few years back, when speccing the Content-Language
pragma, the data I looked at indicated that most authors don't use the
pragma in a way consistent with the meaning of the HTTP header. They
instead use it as a default for setting the document language. There is
more value, IMHO, in making pages work, than in being consistent with a
rarely used feature from HTTP, especially given that that feature is of
dubious benefit as specced anyway.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: what's the language of a document ?

by Leif Halvard Silli-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Phillips, Addison On 09-10-27 16.18:

> One small additional reference:
>
> http://www.w3.org/International/tutorials/language-decl/
>
> I think this illustrates the current state of affairs pretty
> well and I don't think there is any reason to change it, except
> maybe to clarify it a bit. Personally I think that if <html>
> doesn't have either a lang or an xml:lang attribute, the
> default value should continue to be the empty tag.

I filed a bug about this (in fact very old) issue [1]. I recommend
the interested parties to "sit" on that bug. In case no satisfying
outcome, then it will be handled by the WG according to process.

I also recommend that the issue of what the default language for a
document without any language tags applied, is filed as a bug. One
doesn't need to be a member of the WG to file a bug.

http://www.w3.org/Bugs/Public/show_bug.cgi?id=8088
--
leif halvard silli


RE: what's the language of a document ?

by Richard Ishida :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Personally, I agree with Martin here.  I have spent a long time trying
simplify explanations so that people can understand how to manage the
various different ways of declaring language in HTML (http vs meta vs lang;
html vs xhtml vs xml), and it really concerns me that I will now have to say
"But in html5 things are slightly different again".    It's already hard
enough to get people to declare language, and I think that the changes that
come with the current text in html5 will only make things worse by causing
further confusion. On the other hand, I think there may be a way to satisfy
everyone.

We discussed this during the Internationalization WG telecon last night, and
I was actioned to put the following to you and the HTML group on behalf of
the i18n WG.


Our proposal is as follows and is based on the text of the following
sections:
http://www.whatwg.org/specs/web-apps/current-work/multipage/semantics.html#d
ocument-wide-default-language
http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#th
e-lang-and-xml:lang-attributes


[1] Explain clearly that declarations in the http header and the meta
element refer to the document as an object, rather than the text in a
specific element (this is what makes the distinction between single and
multiple values sensible).

[2] Continue to recommend that the document-wide default language be defined
by a lang attribute on the html tag, but say that if the lang attribute is
missing and there is a language defined in the http or meta, then those
language declarations can be used to guess the language of the text, if they
contain a single value.

[3] Establish the precedence between http vs meta.  

[4] Establish the rule that multiple values in the place that has precedence
equates to lang="".

This is very close to what we already have, but doesn't try to make the meta
declaration a different thing than the http declaration, or change it so
that multiple values are no longer valid.  At the same time, it allows
either the http or the meta to provide language information for
text-processing, if the declaration is useable.

We also feel that the spec seems to restrict the use of the term
'document-wide default language' to refer only to a language declared using
the meta, and this is rather odd.  We feel that in fact the lang attribute
on the html element also establishes a document-wide default language. (See
the text: "Until the pragma is successfully processed, there is no
document-wide default language.")

RI

PS: I could suggest some changes to the wording, if that helps.


============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)

http://www.w3.org/International/
http://rishida.net/




> -----Original Message-----
> From: www-international-request@... [mailto:www-international-
> request@...] On Behalf Of "Martin J. Dürst"
> Sent: 27 October 2009 11:09
> To: Ian Hickson
> Cc: Simon Pieters; Divya Manian; Martin Kliehm; John Cowan; <public-
> html@...>; www-international@...
> Subject: Re: what's the language of a document ?
>
> On 2009/10/27 19:37, Ian Hickson wrote:
> > On Tue, 27 Oct 2009, Simon Pieters wrote:
> >> This doesn't match what's specced for<meta http-equiv=content-
> language
> >> content=foo,bar>.
> >
> > That's intentional, and is based on data about how people actually use
> > that pragma.
>
> There's always a way to justify inconsistent choices (be it browser
> implementations, 'data' about how people (who?) use some feature (at
> what point in time?),...). But it would be way better to be consistent.
>
> And there is always a way to justify making choices that everybody
> except those knowing all the details of the spec don't understand. But
> it would be way better to make choices that are easy to understand (e.g.
> http-equiv actually meaning what it says, namely "equivalent to the
> corresponding HTTP header").
>
> There are lots of cases where over time, people have come to a better
> understanding of how things work. For stuff that authors/producers
> aren't supposed to produce, I don't mind too much that HTML5 is
> hopelessly complex and inconsistent. I can live without remembering it
> all, and can tell others to avoid it. However, for stuff like the above,
> which may be used even by very consciously clean developers, creating
> inconsistencies such the above is a heavy negative legacy.
>
> Regards,   Martin.
>
> --
> #-# Martin J. Dürst, Professor, Aoyama Gakuin University
> #-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@...




RE: what's the language of a document ?

by CE Whitehead :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
I personally tend to agree with Roy Fielding, John Cowan, and Tex Texin actually, and not with Martin and Richard Ishida because I regulary create documents in two languages (French-English; French-Old French); following Richard Ishida's recommendations in "Specifying Languages in XHTML and HTML Content," I list all the languages in the meta content tag (when I have access to it; because my documents are generally served from a locale I don't control, I don't have access to the http headers).  I still set the html language to one or the other when possible and then if I get time specify additional information in relevant elements).
 
I think there will always be cases where people will not tag a document correctly; if a tag is needed it makes no sense to eliminate it because someone cannot yet use it properly.  And I think that Tex makes a point too--someone might specify a document language as fr-FR and fr-LU but not fr-CA and it makes no sense to default to unknown.
 
However I'll look at the proposal.
 
Best,
 
C. E. Whitehead 

> From: ishida@...
> To: ian@...
> CC: simonp@...; divya.manian@...; martin.kliehm@...; cowan@...; public-html@...; www-international@...; duerst@...
> Date: Thu, 29 Oct 2009 18:11:27 +0000
> Subject: RE: what's the language of a document ?
>
> Personally, I agree with Martin here. I have spent a long time trying
> simplify explanations so that people can understand how to manage the
> various different ways of declaring language in HTML (http vs meta vs lang;
> html vs xhtml vs xml), and it really concerns me that I will now have to say
> "But in html5 things are slightly different again". It's already hard
> enough to get people to declare language, and I think that the changes that
> come with the current text in html5 will only make things worse by causing
> further confusion. On the other hand, I think there may be a way to satisfy
> everyone.
>
> We discussed this during the Internationalization WG telecon last night, and
> I was actioned to put the following to you and the HTML group on behalf of
> the i18n WG.
>
>
> Our proposal is as follows and is based on the text of the following
> sections:
> http://www.whatwg.org/specs/web-apps/current-work/multipage/semantics.html#d
> ocument-wide-default-language
> http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#th
> e-lang-and-xml:lang-attributes
>
>
> [1] Explain clearly that declarations in the http header and the meta
> element refer to the document as an object, rather than the text in a
> specific element (this is what makes the distinction between single and
> multiple values sensible).
>
> [2] Continue to recommend that the document-wide default language be defined
> by a lang attribute on the html tag, but say that if the lang attribute is
> missing and there is a language defined in the http or meta, then those
> language declarations can be used to guess the language of the text, if they
> contain a single value.
>
> [3] Establish the precedence between http vs meta.
>
> [4] Establish the rule that multiple values in the place that has precedence
> equates to lang="".
>
> This is very close to what we already have, but doesn't try to make the meta
> declaration a different thing than the http declaration, or change it so
> that multiple values are no longer valid. At the same time, it allows
> either the http or the meta to provide language information for
> text-processing, if the declaration is useable.
>
> We also feel that the spec seems to restrict the use of the term
> 'document-wide default language' to refer only to a language declared using
> the meta, and this is rather odd. We feel that in fact the lang attribute
> on the html element also establishes a document-wide default language. (See
> the text: "Until the pragma is successfully processed, there is no
> document-wide default language.")
>
> RI
>
> PS: I could suggest some changes to the wording, if that helps.
>
>
> ============
> Richard Ishida
> Internationalization Lead
> W3C (World Wide Web Consortium)
>
> http://www.w3.org/International/
> http://rishida.net/
>
>
>
>
> > -----Original Message-----
> > From: www-international-request@... [mailto:www-international-
> > request@...] On Behalf Of "Martin J. Dürst"
> > Sent: 27 October 2009 11:09
> > To: Ian Hickson
> > Cc: Simon Pieters; Divya Manian; Martin Kliehm; John Cowan; <public-
> > html@...>; www-international@...
> > Subject: Re: what's the language of a document ?
> >
> > On 2009/10/27 19:37, Ian Hickson wrote:
> > > On Tue, 27 Oct 2009, Simon Pieters wrote:
> > >> This doesn't match what's specced for<meta http-equiv=content-
> > language
> > >> content=foo,bar>.
> > >
> > > That's intentional, and is based on data about how people actually use
> > > that pragma.
> >
> > There's always a way to justify inconsistent choices (be it browser
> > implementations, 'data' about how people (who?) use some feature (at
> > what point in time?),...). But it would be way better to be consistent.
> >
> > And there is always a way to justify making choices that everybody
> > except those knowing all the details of the spec don't understand. But
> > it would be way better to make choices that are easy to understand (e.g.
> > http-equiv actually meaning what it says, namely "equivalent to the
> > corresponding HTTP header").
> >
> > There are lots of cases where over time, people have come to a better
> > understanding of how things work. For stuff that authors/producers
> > aren't supposed to produce, I don't mind too much that HTML5 is
> > hopelessly complex and inconsistent. I can live without remembering it
> > all, and can tell others to avoid it. However, for stuff like the above,
> > which may be used even by very consciously clean developers, creating
> > inconsistencies such the above is a heavy negative legacy.
> >
> > Regards, Martin.
> >
> > --
> > #-# Martin J. Dürst, Professor, Aoyama Gakuin University
> > #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@...
>
>
>

Re: what's the language of a document ?

by "Martin J. Dürst" :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 2009/10/30 3:47, CE Whitehead wrote:
> I personally tend to agree with Roy Fielding, John Cowan, and Tex Texin actually, and not with Martin and Richard Ishida because I regulary create documents in two languages (French-English; French-Old French); following Richard Ishida's recommendations in "Specifying Languages in XHTML and HTML Content," I list all the languages in the meta content tag (when I have access to it; because my documents are generally served from a locale I don't control, I don't have access to the http headers).  I still set the html language to one or the other when possible and then if I get time specify additional information in relevant elements).

I'm sorry, but can you please explain where Richard and I differ from
Roy/John/Tex? It could be that we have very minor differences of how we
have expressed ourselves, but I think we all agree that HTML5 has to be
changed to treat the Content-Language: HTTP response header and the
corresponding <meta> "pragma" the same way.

> I think there will always be cases where people will not tag a document correctly; if a tag is needed it makes no sense to eliminate it because someone cannot yet use it properly.

I have to say that I slightly prefer ignoring multiple values in
Content-Language: or the corresponding "pragma" to taking the first
value for the default language, but that's a minor issue.

> And I think that Tex makes a point too--someone might specify a document language as fr-FR and fr-LU but not fr-CA and it makes no sense to default to unknown.

Well, there are thousands of cases where it's extremely easy for humans
to say "well, the author probably must have meant 'foo'", but if you
actually try to go through all the possibilities and make sure a
computer can do it, then it very quickly becomes very difficult.

As for the "fr-FR and fr-LU but not fr-CA" example, using "fr" as a
default may seem obvious to some, but then that would include "fr-CA",
which the author actually didn't include. So just using "fr" would
actually be wrong.

Regards,    Martin.


--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@...

< Prev | 1 - 2 | Next >