|
View:
New views
12 Messages
—
Rating Filter:
Alert me
|
|
|
|
|
|
Re: japanese encoding nightmareLe 13 nov. 2006 à 10:50, Paul Arenson a écrit : > UNSUCCESSFUL EXAMPLE (Looks ok on desktop but not on server) > http://tokyoprogressive.org/why.html > > CODE > <meta content="text/html; charset=UTF-8" http-equiv="content-type"> but this page is not in utf-8 but in shift-jis Either you have to save your page as utf-8 or to change the encoding information to <META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=Shift_JIS"> > SUCCESSFUL EXAMPLE ONE (JAPANESE COMES OUT RIGHT) > http://www.tokyoprogressive.org/index/weblog/print/april-entries/ Yes the page is rightly utf-8. not valid but utf-8 http://validator.w3.org/check?uri=http%3A%2F% 2Fwww.tokyoprogressive.org%2Findex%2Fweblog%2Fprint%2Fapril-entries%2F > This was made via EXPRESSION ENGINE > > I note I have both xml: lang and uft-8. xml:lang doesn't influence the display of the page. It is there for example for triggering the right accent when passing the text through a vocal browser. Or to help translation engines (not sure they implement it though). Or to help spelling cheker to choose the right dictionary. I would recommend that you stick to utf-8, it would help to keep consistency in the way you serve the pages. A cool plug-in that could be develop and be added to LogValidator. http://www.w3.org/QA/Tools/LogValidator/ Given a list of URIs, create a table with uri server_encoding meta_encoding guessed_encoding Someone on the list would like to do that? http://www.w3.org/QA/Tools/LogValidator/Manual-Modules > I THOUGHT I did this in UFT-8, but no. > Mozilla even says it is UFT-8, but as you can see the code is > western. > In other words, why does it work? because so browsers try to display wrong pages (invalid, wrong encoding, etc.) then people who develop Web pages do not know that they have done something wrong, and they do not fix it. IMHO it is a mistake from browsers. It is cool to try to recover and display the page, but it is wrong to do silent recovery, as we do not enter in a cycle which help everyone to fix things and have a better experience. > SUCCESSUL EXAMPLE FOUR (most bizarre?) > I even forgot to add the meta tag!!! > http://tokyoprogressive.org/ The server is sending by default an information which has usually priority other the information contained in the file. The encoding in a file is a guess, and the browser _should_ follow what the servers says. > Make a page in several encodings > http://tokyoprogressive.org/a.html > <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> > <html> > <head> > <meta content="text/html; charset=ISO-2022-JP" > LOOKS OK ONLINE doesn't look ok for me. but your server is configured in a strange way GET /a.html HTTP/1.1[CRLF] Host: tokyoprogressive.org[CRLF] Connection: close[CRLF] Accept-Encoding: gzip[CRLF] Accept: text/xml,application/xml,application/xhtml+xml,text/ html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5[CRLF] Accept-Language: fr,en;q=0.9,ja;q=0.9,de;q=0.8,es;q=0.7,it;q=0.7,nl;q=0.6,sv;q=0.5,nb;q=0 .5,da;q=0.4,fi;q=0.3,pt;q=0.3,zh-Hans;q=0.2,zh-Hant;q=0.1,ko;q=0.1[CRLF] Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7[CRLF] User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv: 1.8.0.7) Gecko/20060911 Camino/1.0.3 Web-Sniffer/1.0.24[CRLF] Referer: http://web-sniffer.net/[CRLF] [CRLF] Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7[CRLF] You serve first iso-8859-1 and then utf-8 and then anything. Maybe one of the sources of your problems is there. 1. Change all your pages in one encoding only. utf-8 2. Change the configuration of your server to send only utf-8. -- Karl Dubost - http://www.w3.org/People/karl/ W3C Conformance Manager, QA Activity Lead QA Weblog - http://www.w3.org/QA/ *** Be Strict To Be Cool *** |
|
|
Re: japanese encoding nightmare__/__/__/__/__/__/__/__/__/__/ Paul Arenson EMAIL PHONE &VOICE MAIL 1-617-379-0761 (U.S.) 090-4173-3873 (Japan) paularenson (Skype) __/__/__/__/__/__/__/__/__/__/ On Nov 13, 2006, at 10:22 PM, Karl Dubost wrote:
Ok.....way back when i used the predecessor to Expression Engine, the encoding was something other than unicode. Then when I upgraded to unicode, I asked the guy who helped me and he changed something in the program or on my server (using the database???). When he did that the new pages, like above, came out good, though old pages did not. perhaps what he did to make Expression Engine work has to do with the server? As i said, pages look good on my desktop but not on the server....
Anyway, I am a bit lost. Is this something that the person who adjusted my database did when he set for Expression Engine and it affects all pages on server? How do I fix the server (it is a commercial company)... Thanks!
|
|
|
|
|
|
Re: japanese encoding nightmarePaul Arenson wrote: ... >>> CODE >>> <meta content="text/html; charset=UTF-8" http-equiv="content-type"> >> >> but this page is not in utf-8 but in shift-jis > >> Either you have to save your page as utf-8 or to change the encoding >> information to >> <META HTTP-EQUIV="Content-Type" CONTENT="text/html;"> > > > It is? I don't recall using that. hmmm. And when i save to desktop, > changing to shift jis doesn't help, nor does looking at it on the web. > Oh well.... Remember that <META HTTP-EQUIV="..." ...> elements are not supposed to be read by the browser when the browser retrieved the document from a server. Such META elements are for the server to read and use to construct real HTTP header fields (if the server chooses that mechanism). (When dereferencing a "file:..." URL, there is no explicit service, so browsers are probably allowed to read META elements, but they very well might not.) Daniel |
|
|
RE: japanese encoding nightmareDaniel Barclay wrote: >> Remember that <META HTTP-EQUIV="..." ...> elements are not supposed >> to be read by the browser when the browser retrieved the document >> from a server. >> Such META elements are for the server to read and use to construct >> real HTTP header fields (if the server chooses that mechanism). I recently read (from what I remember to be an authoritative source) that in practice servers rarely ever read them because of performance so the browser has to. (The only thing I can remember reading authoritative recently was Weaving the Web, but I don't think TBL covered that in there. I wish my member were better..) This http://www.w3.org/TR/html4/struct/global.html#adef-http-equiv says (emphasis mine): "HTTP servers *MAY* use the property name specified by the http-equiv attribute to create an [RFC822]-style header in the HTTP response." That would imply they might not, and if so the browser would have to handle, no? Anyway, just wanted to point this out (it is a shame the recommendation didn't say "MUST" instead of "MAY") -Mike Schinkel http://www.mikeschinkel.com/blogs/ http://www.welldesignedurls.org/ |
|
|
Re: japanese encoding nightmareOn Mon, Nov 13, 2006 at 06:11:18PM -0500, Mike Schinkel wrote: > This http://www.w3.org/TR/html4/struct/global.html#adef-http-equiv says > (emphasis mine): "HTTP servers *MAY* use the property name specified by the > http-equiv attribute to create an [RFC822]-style header in the HTTP > response." That would imply they might not, and if so the browser would > have to handle, no? No, just that the server should use some other means to determine the character encoding of the document (generally "use the configured value"). > Anyway, just wanted to point this out (it is a shame the recommendation > didn't say "MUST" instead of "MAY") Ouch, every HTTPD an HTML parser? Ouch! -- David Dorward http://dorward.me.uk |
|
|
JAPANESE WOESThanking Greg at Nexcess.net and the many people at public-evangelist@... auch as Karl Dubost <karl@...>, etc. SUMMARY (1) I have done two tests of my problem of unredable Japanese (where I never had this problem before) and found that working at home on a MAC OSX creating files in Mozilla (which previously worked), uploading to tokyoprogressive.org and tokyoprogressive.org.uk (two companies) both fail in all encodings of Japanese. (2) I wrote to the w3.org list and requested help. I got an explanation, but it was above my head (sorry). (3) I have tested at work on Windows 2000 and this time uft-8 works on both servers plus Google. Shift-Jis works only on one of the servers. (4) Conclusion? Could there be something wrong with my MAC suddenly? Should I try another Mac at home? Could it be my internet provider? The fact that the files work (all uft versions, at least) from work Windows machine (also Mozilla) and they do not work from home seems to say something happened to my Mac. DETAILS BELOW FOR TESTING LAST NIGHT FROM HOME (MAC/MOZILLA) http://tokyoprogressive.org/testz.html http://tokyoprogressive.org.uk/testz.html http://tokyoprogressive.org/testzz.html http://tokyoprogressive.org.uk/testzz.html TODAY AT WORK Then today, at work, on a Windows 2000 machine I used Mozilla and again created two files. This time a uft-8 file and a Shift-Jis file. (I prefer UFT-8 but wanted to check.) This time, more encouraging. I uploaded to 3 places: NO GOOD (SHIFT JIS) http://docs.google.com/View?docid=dfztwqbx_31fcz6hv GOOD http://docs.google.com/View?docid=dfztwqbx_32p97g5t NO GOOD http://tokyoprogressive.org/shiftjis.html GOOD http://tokyoprogressive.org/uft8.html GOOD http://tokyoprogressive.org.uk/shiftjis.html GOOD http://tokyoprogressive.org.uk/uft8.html There are question as yet unclear ahbout server configurations, but I thought it significant that things have worked from this Windows MAchine. THanks > I do not think it is the server, because I just took two more files. > one was created before and called testz. The other i created now in > Mozilla using UFT-8. > Called testzz, I uploaded both to two different servers and both came > out wrong. > > > Is my Mozilla corrupted? > > http://tokyoprogressive.org/testz.html > http://tokyoprogressive.org.uk/testz.html > > http://tokyoprogressive.org/testzz.html > http://tokyoprogressive.org.uk/testzz.html > > > Going to bed, it is midnight here. Good night, and thanks. > > > __/__/__/__/__/__/__/__/__/__/ > Paul Arenson > > paul@... > > PHONE &VOICE MAIL > 1-617-379-0761 (U.S.) > 090-4173-3873 (Japan) > paularenson (Skype) > __/__/__/__/__/__/__/__/__/__/ > > > > > > On Nov 13, 2006, at 11:40 PM, Greg Swaney wrote: > >> I did a lot of poking and changing character sets on your account >> on sunday and it never showed the characters how they were supposed >> to be shown. What did w3 say? >> >> Paul Arenson wrote: >>> Hi Greg >>> Further to my Sunday post about files I create in various >>> encodings using Mozilla looking ok on my desktop but not on the >>> server, I wrote to w3.org and they advised me, but it is way over >>> my head. >>> What i am guessing is that files created by Expression Engine >>> output in unicode (UFT-8) and somehow something on the server >>> (database?) >>> tells the server to do something to the encoding. Anyway, when I >>> create a uft encoding on my desktop, it is served different on >>> the site..... >>> I still use Expression Engine, but also use my own pages. >>> Maybe I should contact the guy who set up expression engine for me? >>> I am totally lost....though perhaps it is simple? >>> Thanks! >>> paul >>> see below from the web person--> public-evangelist@... >>> <mailto:public-evangelist@...> >>> thanks >>> __/__/__/__/__/__/__/__/__/__/ >>> Paul Arenson >>> paul@... <mailto:paul@...> >>> PHONE &VOICE MAIL >>> 1-617-379-0761 (U.S.) >>> 090-4173-3873 (Japan) >>> paularenson (Skype) >>> __/__/__/__/__/__/__/__/__/__/ >>> Begin forwarded message: >>>> *Resent-From: *public-evangelist@... <mailto:public- evangelist@...> >>>> *From: *Karl Dubost <karl@... <mailto:karl@...>> >>>> *Date: *November 13, 2006 10:22:09 PM JST >>>> *To: *Paul Arenson <paul@... >>>> <mailto:paul@...>> >>>> *Cc: *public-evangelist@... <mailto:public-evangelist@...> >>>> *Subject: **Re: japanese encoding nightmare* >>>> >>>> >>>> >>>> Le 13 nov. 2006 à 10:50, Paul Arenson a écrit : >>>>> UNSUCCESSFUL EXAMPLE (Looks ok on desktop but not on server) >>>>> http://tokyoprogressive.org/why.html >>>>> >>>>> CODE >>>>> <meta content="text/html; charset=UTF-8" http-equiv="content- type"> >>>> >>>> but this page is not in utf-8 but in shift-jis >>>> >>>> Either you have to save your page as utf-8 or to change the >>>> encoding information to >>>> <META HTTP-EQUIV="Content-Type" CONTENT="text/html;"> >>>> >>>> >>>>> SUCCESSFUL EXAMPLE ONE (JAPANESE COMES OUT RIGHT) >>>>> http://www.tokyoprogressive.org/index/weblog/print/april-entries/ >>>> >>>> Yes the page is rightly utf-8. not valid but utf-8 >>>> http://validator.w3.org/check?uri=http%3A%2F% >>>> 2Fwww.tokyoprogressive.org%2Findex%2Fweblog%2Fprint%2Fapril- >>>> entries%2F >>>> >>>>> This was made via EXPRESSION ENGINE >>>>> >>>>> I note I have both xml: lang and uft-8. >>>> >>>> xml:lang doesn't influence the display of the page. It is there >>>> for example for triggering the right accent when passing the text >>>> through a vocal browser. Or to help translation engines (not >>>> sure they implement it though). Or to help spelling cheker to >>>> choose the right dictionary. >>>> >>>> I would recommend that you stick to utf-8, it would help to keep >>>> consistency in the way you serve the pages. >>>> >>>> A cool plug-in that could be develop and be added to LogValidator. >>>> http://www.w3.org/QA/Tools/LogValidator/ >>>> >>>> Given a list of URIs, create a table with >>>> uri server_encoding meta_encoding guessed_encoding >>>> >>>> Someone on the list would like to do that? >>>> http://www.w3.org/QA/Tools/LogValidator/Manual-Modules >>>> >>>> >>>> >>>>> I THOUGHT I did this in UFT-8, but no. >>>>> Mozilla even says it is UFT-8, but as you can see the code is western. >>>>> In other words, why does it work? >>>> >>>> because so browsers try to display wrong pages (invalid, wrong >>>> encoding, etc.) then people who develop Web pages do not know >>>> that they have done something wrong, and they do not fix it. IMHO >>>> it is a mistake from browsers. >>>> It is cool to try to recover and display the page, but it is >>>> wrong to do silent recovery, as we do not enter in a cycle which >>>> help everyone to fix things and have a better experience. >>>> >>>>> SUCCESSUL EXAMPLE FOUR (most bizarre?) >>>>> I even forgot to add the meta tag!!! >>>>> http://tokyoprogressive.org/ >>>> >>>> The server is sending by default an information which has usually >>>> priority other the information contained in the file. >>>> The encoding in a file is a guess, and the browser _should_ >>>> follow what the servers says. >>>> >>>> >>>>> Make a page in several encodings >>>>> http://tokyoprogressive.org/a.html >>>>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> >>>>> <html> >>>>> <head> >>>>> <meta content="text/html; charset=ISO-2022-JP" >>>>> LOOKS OK ONLINE >>>> >>>> doesn't look ok for me. >>>> >>>> but your server is configured in a strange way >>>> >>>> GET /a.html HTTP/1.1[CRLF] >>>> Host: tokyoprogressive.org[CRLF] >>>> Connection: close[CRLF] >>>> Accept-Encoding: gzip[CRLF] >>>> Accept: text/xml,application/xml,application/xhtml+xml,text/ >>>> html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5[CRLF] >>>> Accept-Language: >>>> fr,en;q=0.9,ja;q=0.9,de;q=0.8,es;q=0.7,it;q=0.7,nl;q=0.6,sv;q=0.5,nb >>>> ;q=0.5,da;q=0.4,fi;q=0.3,pt;q=0.3,zh-Hans;q=0.2,zh- >>>> Hant;q=0.1,ko;q=0.1[CRLF] >>>> Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7[CRLF] >>>> User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv: >>>> 1.8.0.7) Gecko/20060911 Camino/1.0.3 Web-Sniffer/1.0.24[CRLF] >>>> Referer: http://web-sniffer.net/[CRLF] >>>> [CRLF] >>>> >>>> >>>> Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7[CRLF] >>>> >>>> You serve first iso-8859-1 and then utf-8 and then anything. >>>> Maybe one of the sources of your problems is there. >>>> >>>> 1. Change all your pages in one encoding only. >>>> utf-8 >>>> 2. Change the configuration of your server to send only utf-8. >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Karl Dubost - http://www.w3.org/People/karl/ >>>> W3C Conformance Manager, QA Activity Lead >>>> QA Weblog - http://www.w3.org/QA/ >>>> *** Be Strict To Be Cool *** >>>> >>>> >>>> >>>> >> >> -- >> Greg Swaney >> NEXCESS.NET Internet Solutions >> http://nexcess.net >> 304 1/2 S. State St. >> Ann Arbor, MI 48104 >> 1.866.NEXCESS |
|
|
Re: japanese encoding nightmareMike Schinkel wrote: > Daniel Barclay wrote: > >>> Remember that <META HTTP-EQUIV="..." ...> elements are not supposed I should narrow that to "some ... elements " >>> to be read by the browser when the browser retrieved the document >>> from a server. >>> Such META elements are for the server to read and use to construct >>> real HTTP header fields (if the server chooses that mechanism). > > I recently read (from what I remember to be an authoritative source) that in > practice servers rarely ever read them because of performance so the browser > has to. In some cases, the browser is not even allowed to use them. If the server indicates the content type and character encoding ("charset") in the HTTP response, the browser must use _that_ type and charset and must _not_ use values from a <META HTTP-EQUIV="Content-Type" ...> element or anything else in the returned entity (document) to determine the type and charset. That is, the server's HTTP headers override any specifications inside the entity. A server is supposed to be able to change the encoding of a document as long as it reports the encoding correctly in the Content-Type header. It is not supposed to have to change any <META HTTP-EQUIV="Content-Type" ...> elements. (Besides requiring any transcoding server to understand HTML, changing such elements would be changing the _contents_ of the document, not just changing its _encoding_ (changing the sequence of characters, not just changing the bytes that encode the characters).) If the browser ignored the Content-Type header from the server and read a <META HTTP-EQUIV="Content-Type" ...> element, it might be trying to use the wrong encoding. I thought that any browser that behaved differently (say, IE 6, which sometimes ignores "text/plain" from the server) violated some specification. However, looking at the HTML 4.01 specification, I only see wording about servers' being allowed to read such element: - "HTTP servers use this attribute to gather information for HTTP response message headers" - "HTTP servers may use the property name specified by the http-equiv attribute to create an [RFC822]-style header in the HTTP response." Evidently my source was something else. I don't remember which document it was, so I don't know whether it was as authoritative as a specification. (I do think it was something from the W3C.) Note that XML has similar a rule regarding the character encoding specified inside an XML document in the XML declaration ("<?xml encoding='...'?>"). If the character encoding is specified to the XML processor at a higher level (e.g., via an HTTP Content-Type header), then the processor must ignore the character encoding specification in the XML declaration. (Again, I can't find that in the XML specification itself, so I can't currently vouch for the authoritativeness of my source.) Of course, that's all about the content type and encoding. Since I don't recall my source, I can't say whether most HTTP-EQUIV elements are like Content-Type (the browser must _not_ use them) or not (the browser can use them). > This http://www.w3.org/TR/html4/struct/global.html#adef-http-equiv says > (emphasis mine): "HTTP servers *MAY* use the property name specified by the > http-equiv attribute to create an [RFC822]-style header in the HTTP > response." That would imply they might not, and if so the browser would > have to handle, no? Not quite. It's not a server's not reading HTTP-EQUIV information from inside an HTML document that might imply that the browser should read it. If the server read more-authoritative information from elsewhere (e.g., a server configuration file describing the documents to be served out) and reported it in an HTTP header, then the browser should not ignore its more-authoritative source (the server HTTP response header) and instead read an less-authoritative source (the insides of the document). However, it might be a server's not sending a header at all that implies that the browser can (or maybe should) use HTTP-EQUIV information. (I'm not sure that there's not a case where the server can choose to not return a certain header and where the browser should take that lack of a header as authoritative.) Daniel |
|
|
RE: japanese encoding nightmarePaul, read this and let me know if you still have questions: Changing (X)HTML page encoding to UTF-8 http://www.w3.org/International/questions/qa-changing-encoding RI ============ Richard Ishida Internationalization Lead W3C (World Wide Web Consortium) http://www.w3.org/People/Ishida/ http://www.w3.org/International/ http://people.w3.org/rishida/blog/ http://www.flickr.com/photos/ishida/ ________________________________ From: public-evangelist-request@... [mailto:public-evangelist-request@...] On Behalf Of Paul Arenson Sent: 13 November 2006 01:51 To: public-evangelist@... Cc: Paul Arenson Subject: japanese encoding nightmare Hello I came here via http://www.webstandards.org/learn/articles/askw3c/dec2002/ For a long time I have used Mozilla to create (or adapt other) web pages. It has worked. I went back and was surprised that it worked DESPITE different encodings I inadvertantly used. But recently tried to make pages that did NOT work!!!! Am not sure why. And so I am wriiting. UNSUCCESSFUL EXAMPLE (Looks ok on desktop but not on server) http://tokyoprogressive.org/why.html CODE <meta content="text/html; charset=UTF-8" http-equiv="content-type"> here are successful example from the past: - - - - - - - - - - - - - SUCCESSFUL EXAMPLE ONE (JAPANESE COMES OUT RIGHT) http://www.tokyoprogressive.org/index/weblog/print/april-entries/ This was made via EXPRESSION ENGINE I note I have both xml: lang and uft-8. I also note I am confused about differences between character encoding and language, but anyway, it works. CODE <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ja" lang="ja"> <head> <title>April entries</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> - - - - - - - - - - - - - SUCCESSFUL EXAMPLE TWO http://tokyoprogressive.org/indexoct2006.html THIS WAS MADE BY HAND USING a CSS TEMPLATE. I THOUGHT I did this in UFT-8, but no. Mozilla even says it is UFT-8, but as you can see the code is western. In other words, why does it work? CODE <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> - - - - - - - - - - - - - SUCCESSFUL EXAMPLE THREE http://tokyoprogressive.org/indexnov2006.html Now here is one where I specified uft-8 and it too is ok! <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> SUCCESSUL EXAMPLE FOUR (most bizarre?) I even forgot to add the meta tag!!! http://tokyoprogressive.org/ - - - - - - - - - - - - - PROBLEMS STARTED APPEARING WITH NEW PAGES EXPERIMENT: Method Make a page in several encodings http://tokyoprogressive.org/a.html <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html; charset=ISO-2022-JP" LOOKS OK ONLINE - - - - - - - - - - - - - http://tokyoprogressive.org/b.html <meta content="text/html; charset=UTF-8" http-equiv="content-type"> DOES NOT LOOK OK ONLINE - - - - - - - - - - - - - http://tokyoprogressive.org/c.html <meta content="text/html; charset=Shift_JIS" http-equiv="content-type"> DOES NOT LOOK OK ONLINE - - - - - - - - - - - - - http://tokyoprogressive.org/d.html <meta content="text/html; charset=EUC-JP" http-equiv="content-type"> DOES NOT LOOK OK ONLINE - - - - - - - - - - - - - CONCLUSION: Can anyone tell me what is going on? Thanks! __/__/__/__/__/__/__/__/__/__/ Paul Arenson paul@... __/__/__/__/__/__/__/__/__/__/ |
|
|
RE: japanese encoding nightmareHi Richard, That page seems incomplete and potentially dangerous. 1) Simply saying to save as utf-8 ignores the problem of knowing which encoding you are starting from. Often text is thought to be iso-8859-1, big-5 or some other encoding and it is actually 1252, big5-hkscs or a variant or different encoding. If the source encoding is incorrect, then the conversion to utf-8 may result in the wrong characters and data loss. The document should make sure users proactively identify the correct encoding of the page before transcoding. 2) When converting text or html to utf-8 special consideration needs to be given to URLs. A URL has 4 parts: scheme, domain, path and query. Schemes are ASCII and not a problem to convert to utf-8 as they remain ASCII. Domains and Paths should be convertible to UTF-8. (They will go thru additional conversions to an ASCII form before going over the wire.) However the query portion of a URL is not necessarily convertible to Unicode. The query portion represents data that is used as a reference within some other application pointed to by the remainder of the URL. That application may require an encoding other than UTF-8 or it may not be textual. Conversion to utf-8 may therefore damage the URL. For example, I might have a cgi and database application based on iso-8859-1. The original URL might be the following contrived example (I left off the scheme http: since it isn't a working url) www.i18nguy.com/?find=cafe In a page encoded as iso-8859-1 the e-acute will be represented by a single byte as 0xE9. The i18nguy.com cgi and database application will expect to match the byte 0xE9. If the URL is transcoded to UTF-8, the character e-acute will become two bytes and represented in the URL by hex encoding as %C3%A9. The URL will no longer work unless the application is also modified to expect UTF-8 values. However, when the x(h)tml page is transcoded to utf-8, the embedded URLs may be links to applications that we have no control over and they may be affected. Therefore a more appropriate recommendation might be to first represent the query portions of a URL by a hex-encoded form in the original encoding, and then the page can be converted to utf-8. E.g. convert www.i18nguy.com/?find=cafe to www.i18nguy.com/?find=caf%E9 Subsequent transcoding to utf-8 won't change the value %E9. On the other hand, simply transcoding to utf-8 will give www.i18nguy.com/?find=caf%C3%A9 which will break the link or reference the incorrect value in the target application. ==== Haven't we been over this ground before? Perhaps in one of the other documents. The page should be updated. tex -----Original Message----- From: public-evangelist-request@... [mailto:public-evangelist-request@...] On Behalf Of Richard Ishida Sent: Thursday, November 23, 2006 2:16 AM To: 'Paul Arenson'; public-evangelist@... Subject: RE: japanese encoding nightmare Paul, read this and let me know if you still have questions: Changing (X)HTML page encoding to UTF-8 http://www.w3.org/International/questions/qa-changing-encoding RI ============ Richard Ishida Internationalization Lead W3C (World Wide Web Consortium) http://www.w3.org/People/Ishida/ http://www.w3.org/International/ http://people.w3.org/rishida/blog/ http://www.flickr.com/photos/ishida/ ________________________________ From: public-evangelist-request@... [mailto:public-evangelist-request@...] On Behalf Of Paul Arenson Sent: 13 November 2006 01:51 To: public-evangelist@... Cc: Paul Arenson Subject: japanese encoding nightmare Hello I came here via http://www.webstandards.org/learn/articles/askw3c/dec2002/ For a long time I have used Mozilla to create (or adapt other) web pages. It has worked. I went back and was surprised that it worked DESPITE different encodings I inadvertantly used. But recently tried to make pages that did NOT work!!!! Am not sure why. And so I am wriiting. UNSUCCESSFUL EXAMPLE (Looks ok on desktop but not on server) http://tokyoprogressive.org/why.html CODE <meta content="text/html; charset=UTF-8" http-equiv="content-type"> here are successful example from the past: - - - - - - - - - - - - - SUCCESSFUL EXAMPLE ONE (JAPANESE COMES OUT RIGHT) http://www.tokyoprogressive.org/index/weblog/print/april-entries/ This was made via EXPRESSION ENGINE I note I have both xml: lang and uft-8. I also note I am confused about differences between character encoding and language, but anyway, it works. CODE <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ja" lang="ja"> <head> <title>April entries</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> - - - - - - - - - - - - - SUCCESSFUL EXAMPLE TWO http://tokyoprogressive.org/indexoct2006.html THIS WAS MADE BY HAND USING a CSS TEMPLATE. I THOUGHT I did this in UFT-8, but no. Mozilla even says it is UFT-8, but as you can see the code is western. In other words, why does it work? CODE <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> - - - - - - - - - - - - - SUCCESSFUL EXAMPLE THREE http://tokyoprogressive.org/indexnov2006.html Now here is one where I specified uft-8 and it too is ok! <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> SUCCESSUL EXAMPLE FOUR (most bizarre?) I even forgot to add the meta tag!!! http://tokyoprogressive.org/ - - - - - - - - - - - - - PROBLEMS STARTED APPEARING WITH NEW PAGES EXPERIMENT: Method Make a page in several encodings http://tokyoprogressive.org/a.html <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html; charset=ISO-2022-JP" LOOKS OK ONLINE - - - - - - - - - - - - - http://tokyoprogressive.org/b.html <meta content="text/html; charset=UTF-8" http-equiv="content-type"> DOES NOT LOOK OK ONLINE - - - - - - - - - - - - - http://tokyoprogressive.org/c.html <meta content="text/html; charset=Shift_JIS" http-equiv="content-type"> DOES NOT LOOK OK ONLINE - - - - - - - - - - - - - http://tokyoprogressive.org/d.html <meta content="text/html; charset=EUC-JP" http-equiv="content-type"> DOES NOT LOOK OK ONLINE - - - - - - - - - - - - - CONCLUSION: Can anyone tell me what is going on? Thanks! __/__/__/__/__/__/__/__/__/__/ Paul Arenson paul@... __/__/__/__/__/__/__/__/__/__/ |
|
|
Re: japanese encoding nightmare: conclusionHi
Way back a month ago I asked a question about why I was able to create a functioning web page on my MAC desktop that showed up wrong on my server, You might be interested in this report. It shows something very wieird with one machine or program or set of programs ... It was created as UTF-8 yet one of you mentioned that it was JIS (anyway Japanese). Well, I found that uploaded to another company's server it was also the same problem. Duplicating the same thing on another Mac as well as a Windows machine I found subsequently that creating a similar file worked on the desktops as well as the servers. So I again went back to the offending Mac, created a new file, and again the same problem . When i sent that file to myself and pic ked it up on the other Mac (using Email), then uploaded to the web, it was fine. Conclusion: something on that one mac is corrupting the file. It is mysterious and never happened before. I tried to download a new version of Mozilla midwat between last time and now and as I recall it did not change things. Shall i conclude that something on my one Mac is corrupting things? Anyway, your guess is as good as mine, but this does seem to be the problem with one Mac. Would you guess I should reformat the thing, or do you have any idea what might cause the Mac/Mozilla/FTP program (one or all?) to mess up a file? I can do any tests if anyone is interested. Thanks __/__/__/__/__/__/__/__/__/__/ Paul Arenson EMAIL PHONE &VOICE MAIL 1-617-379-0761 (U.S.) 090-4173-3873 (Japan) paularenson (Skype) __/__/__/__/__/__/__/__/__/__/ On Nov 13, 2006, at 11:25 PM, Paul Arenson wrote:
|
| Free embeddable forum powered by Nabble | Forum Help |