|
View:
New views
8 Messages
—
Rating Filter:
Alert me
|
|
|
[ruby-core:26429] [Bug #2313] Incomplete encoding conversion?Bug #2313: Incomplete encoding conversion?
http://redmine.ruby-lang.org/issues/show/2313 Author: Adam Salter Status: Open, Priority: Normal Category: core ruby -v: ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-darwin10] I get the following error in irb: >> "http://localhost/posts/eeé".encode('ASCII-8BIT') Encoding::UndefinedConversionError: "\xC3\xA9" from UTF-8 to ASCII-8BIT from (irb):7:in `encode' from (irb):7 from /opt/local/bin/irb:12:in `<main>' Is this a bug? ASCII-8BIT is (as far as I understand it) essentially binary, so you should be able to convert any string to ASCII-8BIT. ---------------------------------------- http://redmine.ruby-lang.org |
|
|
[ruby-core:26430] [Bug #2313](Rejected) Incomplete encoding conversion?Issue #2313 has been updated by Yui NARUSE.
Status changed from Open to Rejected That is not a conversion; that is setting an encoding. So you should use String#force_encoding(enc). ---------------------------------------- http://redmine.ruby-lang.org/issues/show/2313 ---------------------------------------- http://redmine.ruby-lang.org |
|
|
[ruby-core:26441] [Bug #2313] Incomplete encoding conversion?Issue #2313 has been updated by Adam Salter.
Ok. I'm still a little unclear. The Ruby 1.9 docs say String#encode "returns a copy of str transcoded to encoding 'encoding'". From James Edward Grey article on strings String#force_encoding 'doesn't change the data at all, just the rules for interpreting that data'. So String#force_encoding is not a conversion/transcoding. Shouldn't you be able to String#encode any string as ASCII-8BIT? (If not is there somewhere I can read up more on this?) ---------------------------------------- http://redmine.ruby-lang.org/issues/show/2313 ---------------------------------------- http://redmine.ruby-lang.org |
|
|
[ruby-core:26442] [Bug #2313] Incomplete encoding conversion?Issue #2313 has been updated by Yui NARUSE.
The data of String consist from byte string and an encoding. String#encode changes both, but String#force_encoding changes only its encoding. You know, "converting to ASCII-8BIT" doesn't change its byte string, so this is String#force_encoding's business. ---------------------------------------- http://redmine.ruby-lang.org/issues/show/2313 ---------------------------------------- http://redmine.ruby-lang.org |
|
|
[ruby-core:26446] [Bug #2313] Incomplete encoding conversion?Issue #2313 has been updated by Adam Salter.
OK. Thank you. I do think it makes sense to be able to do: >> "元気".encode('UTF-8').encode('ASCII-8BIT').encode('UTF-8') .. even though it doesn't actually change the string bytes internally. But, I guess it's only for ASCII-8BIT that it would be necessary to use String#force_encoding. What about this? >> "元気".encode('UTF-8').encode('SHIFT_JIS').encode('UTF-8') => "元気" >> "元気".encode('UTF-8').force_encoding('ASCII-8BIT').encode('UTF-8') Encoding::UndefinedConversionError: "\xE5" from ASCII-8BIT to UTF-8 from (irb):24:in `encode' from (irb):24 from /opt/local/bin/irb:12:in `<main>' Is that a bug in the UTF-8 encoding parser? Or is it related to this problem? ---------------------------------------- http://redmine.ruby-lang.org/issues/show/2313 ---------------------------------------- http://redmine.ruby-lang.org |
|
|
[ruby-core:26450] [Bug #2313] Incomplete encoding conversion?Issue #2313 has been updated by Yui NARUSE.
> >> "元気".encode('UTF-8').force_encoding('ASCII-8BIT').encode('UTF-8') > Encoding::UndefinedConversionError: "\xE5" from ASCII-8BIT to UTF-8 > from (irb):24:in `encode' > from (irb):24 > from /opt/local/bin/irb:12:in `<main>' > > Is that a bug in the UTF-8 encoding parser? Or is it related to this problem? OK, I'll explain step by step: str = "元気" # You make a String which contains "元気" encode by some encoding # str's byte data is some byte string which means "元気" # str's encoding is a source encoding str = str.encode('UTF-8') # str is encoded to UTF-8, so # str's byte data is "\xE5\x85\x83\xE6\xB0\x97" # str's encoding is UTF-8 str.force_encoding('ASCII-8BIT') # change str's encoding to ASCII-8BIT, so # str's byte data is "\xE5\x85\x83\xE6\xB0\x97" # str's encoding is now ASCII-8BIT Then you try str.encode('UTF-8') and this String#encode converts byte data: String#encode try to convert "\xE5" from ASCII-8BIT to UTF-8, but there is no mapping. What you want to do is not a conversion, it should be setting encoding. str.force_encoding('UTF-8') # change str's encoding to UTF-8, so # str's byte data is "\xE5\x85\x83\xE6\xB0\x97" # str's encoding is now UTF-8 ---------------------------------------- http://redmine.ruby-lang.org/issues/show/2313 ---------------------------------------- http://redmine.ruby-lang.org |
|
|
[ruby-core:26464] [Bug #2313] Incomplete encoding conversion?Issue #2313 has been updated by Adam Salter.
OK I understand now :) I was mixing up the available encoding converters... There is no Encoding::Converter from UTF-8 to ASCII-8BIT (or visa versa ;). Thank you for your patience. ---------------------------------------- http://redmine.ruby-lang.org/issues/show/2313 ---------------------------------------- http://redmine.ruby-lang.org |
|
|
[ruby-core:26480] Re: [Bug #2313] Incomplete encoding conversion?Hello Adam,
On 2009/11/01 10:35, Adam Salter wrote: > Issue #2313 has been updated by Adam Salter. > > > OK I understand now :) I was mixing up the available encoding converters... There is no Encoding::Converter from UTF-8 to ASCII-8BIT (or visa versa ;). No, there should be an Encoding::Converter from UTF-8 to ASCII-8BIT (or you should be able to create one). The underlying conversion table is available. For example, the following works: puts 'abc'.encode('UTF-8').encode('ASCII-8BIT') => abc The reason this works is that ASCII-8BIT is defined to contain (7-bit) ASCII. The fact that "元気".encode('UTF-8').encode('ASCII-8BIT') doesn't work is very similar to the fact that e.g. "Dürst".encode('UTF-8').encode('shift_jis') doesn't work: There is no "ü" character in Shift_JIS, and there is no "元" character in ASCII-8BIT. So the transcoding engine has to give up, usually with an exception. This can also be understood when noticing that String#encode tries to preserve character identity. If we just copied arbitrary bytes into an ASCII-8BIT string, we would still have the same bytes (you can do that with force-encoding), but the only thing Ruby knows is that these are bytes, it has no idea which characters they represent. That's why for removing such information (e.g. with .force_encoding('ASCII-8BIT')) as well as for adding such information (e.g. with .force_encoding('UTF-8')), we use a long and forceful method name that should give programmers the message "watch out, you need to know by yourself what you're doing". Regards, Martin. > Thank you for your patience. > ---------------------------------------- > http://redmine.ruby-lang.org/issues/show/2313 > > ---------------------------------------- > http://redmine.ruby-lang.org > > -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@... |
| Free embeddable forum powered by Nabble | Forum Help |