|
View:
New views
10 Messages
—
Rating Filter:
Alert me
|
|
|
How to translate utf-8 hex to unicode hex?Hi, My input is from HTTP, 3 hard-coded bytes of UTF-8 hex value. What I want is 2 bytes unicode. For example: let input = "%E9%A6%AC" let output = "99AC" Based on the output, I can then get the real CJK: 馬. Is it possible to do it from within Vim? Thanks Sean --~--~---------~--~----~------------~-------~--~----~ You received this message from the "vim_use" maillist. For more information, visit http://www.vim.org/maillist.php -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: How to translate utf-8 hex to unicode hex?On 10/11/09 19:44, Sean wrote: > > Hi, > > My input is from HTTP, 3 hard-coded bytes of UTF-8 hex value. > What I want is 2 bytes unicode. > > For example: > let input = "%E9%A6%AC" > let output = "99AC" > > Based on the output, I can then get the real CJK: 馬. > > Is it possible to do it from within Vim? > > Thanks > > Sean You can do it the hard way, with arithmetic computations which I shall explain below. Or you can do it the easy way, by writing the bytes to disc as if they were Latin1 (see ":help ++opt") and reading them back as UTF-8. Or you can use the iconv() function (q.v.). UTF-8 bytes are divided in "waterproof" categories as follows: - Bytes 0x00 to 0x7F are "single" bytes, they each represent a single codepoint in the exact same format as in Latin-1 or 7-bit US-ASCII. - Bytes 0xC0 to (currently) 0xF4 or (as originally foreseen and still supported by Vim) 0xFD are "header" bytes in a multibyte sequence. Such a byte MUST be the first byte of its sequence and the number of "one" bits above the topmost "zero" bit indicates the number of bytes (including this one) in the whole sequence. - Bytes 0x80 to 0xBF are "trailer" bytes in a multibyte sequence. They can be any byte in the sequence except the first. - Bytes OxFE and OxFF are always invalid anywhere in UTF-8 text. - In the bytes of a multibyte sequence, all bits after the topmost "zero" bit in each byte constitute the "payload": they are data bits, and in UTF-8 the most significant bits always come first. Your example translates as follows: 0xE9 = 1110.1001 binary header byte the sequence is of three bytes payload: 1001 0xA6 = 1010.0110 binary trailer byte payload: 100110 0xAC = 1010.1100 binary trailer byte payload: 101100 Result (concatenated payload bits) 1001.1001.1010.1100 binary, or U+99AC Note that some hanzi are above U+20000; the UTF-8 code for them consists of four bytes, not three: e.g. = U+20123 = UTF-8 0xF0 0xA0 0x84 0xA3 = %F0%A0%84%A3 in "percent-escaped" HTTP coding. The Unicode code space had originally been foreseen as ranging from U+0000 to U+7FFFFFFF but the current standards say that no codepoints above U+10FFFD will ever be valid; also, codepoints whose hex representation is xxFFFE or xxFFFF (where xx is anything) have been expressly designated as invalid, never to be used. Best regards, Tony. -- Putt's Law: Technology is dominated by two types of people: Those who understand what they do not manage. Those who manage what they do not understand. --~--~---------~--~----~------------~-------~--~----~ You received this message from the "vim_use" maillist. For more information, visit http://www.vim.org/maillist.php -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: How to translate utf-8 hex to unicode hex?Hi Tony, I thought I had enough knowledge on UNICODE and UTF8, but it is nothing after reading your message. Now, I get what I want: let input = "\xE9\xA6\xAC" let output=iconv(input, "utf-8", "utf8") Bingo! The output is real ==> '馬' Thanks again. Sean On Nov 10, 12:06 pm, Tony Mechelynck <antoine.mechely...@...> wrote: > On 10/11/09 19:44, Sean wrote: > > > > > > > Hi, > > > My input is from HTTP, 3 hard-coded bytes of UTF-8 hex value. > > What I want is 2 bytes unicode. > > > For example: > > let input = "%E9%A6%AC" > > let output = "99AC" > > > Based on the output, I can then get the real CJK: 馬. > > > Is it possible to do it from within Vim? > > > Thanks > > > Sean > > You can do it the hard way, with arithmetic computations which I shall > explain below. > > Or you can do it the easy way, by writing the bytes to disc as if they > were Latin1 (see ":help ++opt") and reading them back as UTF-8. > > Or you can use the iconv() function (q.v.). > > UTF-8 bytes are divided in "waterproof" categories as follows: > > - Bytes 0x00 to 0x7F are "single" bytes, they each represent a single > codepoint in the exact same format as in Latin-1 or 7-bit US-ASCII. > > - Bytes 0xC0 to (currently) 0xF4 or (as originally foreseen and still > supported by Vim) 0xFD are "header" bytes in a multibyte sequence. Such > a byte MUST be the first byte of its sequence and the number of "one" > bits above the topmost "zero" bit indicates the number of bytes > (including this one) in the whole sequence. > > - Bytes 0x80 to 0xBF are "trailer" bytes in a multibyte sequence. They > can be any byte in the sequence except the first. > > - Bytes OxFE and OxFF are always invalid anywhere in UTF-8 text. > > - In the bytes of a multibyte sequence, all bits after the topmost > "zero" bit in each byte constitute the "payload": they are data bits, > and in UTF-8 the most significant bits always come first. > > Your example translates as follows: > > 0xE9 = 1110.1001 binary > header byte > the sequence is of three bytes > payload: 1001 > 0xA6 = 1010.0110 binary > trailer byte > payload: 100110 > 0xAC = 1010.1100 binary > trailer byte > payload: 101100 > Result (concatenated payload bits) 1001.1001.1010.1100 binary, or U+99AC > > Note that some hanzi are above U+20000; the UTF-8 code for them consists > of four bytes, not three: e.g. = U+20123 = UTF-8 0xF0 0xA0 0x84 0xA3 > = %F0%A0%84%A3 in "percent-escaped" HTTP coding. > > The Unicode code space had originally been foreseen as ranging from > U+0000 to U+7FFFFFFF but the current standards say that no codepoints > above U+10FFFD will ever be valid; also, codepoints whose hex > representation is xxFFFE or xxFFFF (where xx is anything) have been > expressly designated as invalid, never to be used. > > Best regards, > Tony. > -- > Putt's Law: > Technology is dominated by two types of people: > Those who understand what they do not manage. > Those who manage what they do not understand. You received this message from the "vim_use" maillist. For more information, visit http://www.vim.org/maillist.php -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: How to translate utf-8 hex to unicode hex?On 10/11/09 22:24, Sean wrote: > > Hi Tony, > > I thought I had enough knowledge on UNICODE and UTF8, but it is > nothing after reading your message. > > Now, I get what I want: > > let input = "\xE9\xA6\xAC" > let output=iconv(input, "utf-8", "utf8") > > Bingo! The output is real ==> '馬' > > Thanks again. > > Sean The particular parameters you give to iconv make it an identity permutation. When I do :echo "\xE9\xA6\xAC" (with 'encoding' set to "utf-8") the result is 馬 Best regards, Tony. -- "I am, in point of fact, a particularly haughty and exclusive person, of pre-Adamite ancestral descent. You will understand this when I tell you that I can trace my ancestry back to a protoplasmal primordial atomic globule. Consequently, my family pride is something inconceivable. I can't help it. I was born sneering." -- Pooh-Bah, "The Mikado", Gilbert & Sullivan --~--~---------~--~----~------------~-------~--~----~ You received this message from the "vim_use" maillist. For more information, visit http://www.vim.org/maillist.php -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: How to translate utf-8 hex to unicode hex?Hi Tony, The last mile to go: This always worked: (output is 馬) ------------------------------------------- let input = "\xE9\xA6\xAC" let output = iconv(input, "UTF-8", "UTF-8") ------------------------------------------- However, this failed: (output is '\xE9\xA6\xAC') ------------------------------------------- let input = "%E9%A6%AC" let input = substitute(input, '%', '\\x', 'g') let output = iconv(input, "UTF-8", "UTF-8") ------------------------------------------- Now, the key becomes how to translate string (single quoted) to string (double quoted). I guess that "\x" is meaningful only within double quote. Thanks Sean --~--~---------~--~----~------------~-------~--~----~ You received this message from the "vim_use" maillist. For more information, visit http://www.vim.org/maillist.php -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: How to translate utf-8 hex to unicode hex?On 11/11/09 02:06, Sean wrote: > > Hi Tony, > > The last mile to go: > > This always worked: (output is 馬) > ------------------------------------------- > let input = "\xE9\xA6\xAC" > let output = iconv(input, "UTF-8", "UTF-8") > ------------------------------------------- > > However, this failed: (output is '\xE9\xA6\xAC') > ------------------------------------------- > let input = "%E9%A6%AC" > let input = substitute(input, '%', '\\x', 'g') > let output = iconv(input, "UTF-8", "UTF-8") > ------------------------------------------- > > Now, the key becomes how to translate string (single quoted) to string > (double quoted). I guess that "\x" is meaningful only within double > quote. > > Thanks > > Sean > > > what about (untested) function HttpToString(str) return substitute(a:str, '%\(\x\x\)', \ '\=eval(''"\x'' . submatch(1) . ''"'')', 'g') endfunction If I haven't goofed, :echo HttpToStr('%E9%A6%AC') ought to return 馬 Note the use of pairs of single quotes to represent actual single quotes in a single-quoted string. The use of a continuation line assumes 'nocompatible'. See also :help sub-replace-expression :help eval() Best regards, Tony. -- Lizzie Borden took an axe, And plunged it deep into the VAX; Don't you envy people who Do all the things _YOU_ want to do? --~--~---------~--~----~------------~-------~--~----~ You received this message from the "vim_use" maillist. For more information, visit http://www.vim.org/maillist.php -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: How to translate utf-8 hex to unicode hex?Hi Tony,
You are real genius! It simply worked without modification! I added it as part of VimIM plugin online: http://maxiangjiang.googlepages.com/vimim.vim.html " ================================ }}} " ==== VimIM SoGou Cloud IM ==== {{{ " ==================================== Now, let me show you the power of Vim: input in PinYin => woyouyigeqiguaidemeilidemeng output in Chinese => 我有一个奇怪的美丽的梦 It is meaningless :) => "I have a strange but beautiful dream." This is my gift to you: http://maxiangjiang.googlepages.com/dream.png Thanks Sean --~--~---------~--~----~------------~-------~--~----~ You received this message from the "vim_use" maillist. For more information, visit http://www.vim.org/maillist.php -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: How to translate utf-8 hex to unicode hex?On Wed, Nov 11, 2009 at 12:15 PM, Sean <maxiangjiang@...> wrote: Hi Tony, You are the author of the VimIM ? Nice to see you here. And you want to use the sogou-cloud input result , and translate the content from url to the vim content ?
Realy amazing thought ;-)
--~--~---------~--~----~------------~-------~--~----~ You received this message from the "vim_use" maillist. For more information, visit http://www.vim.org/maillist.php -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: How to translate utf-8 hex to unicode hex?On 11/11/09 05:15, Sean wrote: > Hi Tony, > > You are real genius! It simply worked without modification! > > I added it as part of VimIM plugin online: > http://maxiangjiang.googlepages.com/vimim.vim.html > " ================================ }}} > " ==== VimIM SoGou Cloud IM ==== {{{ > " ==================================== > > Now, let me show you the power of Vim: > > input in PinYin => woyouyigeqiguaidemeilidemeng > output in Chinese => 我有一个奇怪的美丽的梦 > It is meaningless :) => "I have a strange but beautiful dream." Meaningless? It evokes powerful meanings to me; I link it with Martin Luther King's famous "I had a dream" discourse, and, maybe less known, the Hymn ("La Espero", i.e. "Hope") of the Esperantist movement, which ends in words meaning "Our diligent colleagues won't tire in a labour of peace, till the beautiful dream of mankind shall come true for eternal blessing". > > This is my gift to you: > http://maxiangjiang.googlepages.com/dream.png > > Thanks My thanks to you, for the beautiful sentence in hanzi. > > Sean Best regards, Tony. -- hundred-and-one symptoms of being an internet addict: 187. You promise yourself that you'll only stay online for another 15 minutes...at least once every hour. --~--~---------~--~----~------------~-------~--~----~ You received this message from the "vim_use" maillist. For more information, visit http://www.vim.org/maillist.php -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: How to translate utf-8 hex to unicode hex?This is what I am using (need +python): " %xx -> 对应的字符(到消息)[[[2 function Lilydjwg_hexchar() let chars = Lilydjwg_get_pattern_at_cursor('\(%[[:xdigit:]]\{2}\)\ +') if chars == '' echohl WarningMsg echo '在光标处未发现%表示的十六进制字符串!' "echo that the form of string cannot be found there. echohl None return endif let str = substitute(chars, '%', '\\x', 'g') exe 'py print ''' . str . '''' endfunction " 取得光标处的匹配[[[2 function Lilydjwg_get_pattern_at_cursor(pat) "This is a function I borrowed from another plugin let col = col('.') - 1 let line = getline('.') let ebeg = -1 let cont = match(line, a:pat, 0) while (ebeg >= 0 || (0 <= cont) && (cont <= col)) let contn = matchend(line, a:pat, cont) if (cont <= col) && (col < contn) let ebeg = match(line, a:pat, cont) let elen = contn - ebeg break else let cont = match(line, a:pat, contn) endif endwh if ebeg >= 0 return strpart(line, ebeg, elen) else return "" endif endfunction nmap <silent> t% :call Lilydjwg_hexchar()<CR> After writing this to .vimrc, when I move the cursor to where the %xx string is and press t%, I can see the decoded string. Or you can just use a program called ascii2uni, eg: echo %E9%A6%AC | ascii2uni -q -a J and the output is 馬. You can combine this with the filter (:h filter) feature of Vim to get lines of characters converted directly from within Vim. On Nov 11, 2:44 am, Sean <maxiangji...@...> wrote: > Hi, > > My input is from HTTP, 3 hard-coded bytes of UTF-8 hex value. > What I want is 2 bytes unicode. > > For example: > let input = "%E9%A6%AC" > let output = "99AC" > > Based on the output, I can then get the real CJK: 馬. > > Is it possible to do it from within Vim? > > Thanks > > Sean You received this message from the "vim_use" maillist. For more information, visit http://www.vim.org/maillist.php -~----------~----~----~----~------~----~------~--~--- |
| Free embeddable forum powered by Nabble | Forum Help |