« Return to Thread: Unicode or UTF-8 encoded file support?

Re: Unicode or UTF-8 encoded file support?

by Eric Miao :: Rate this Message:

Reply to Author | View in Thread

Ok,

I'll speak in english for the sake of our non-chinese speaking friends :-)

Well, from the editor's point of view, the chinese characters will
occupy 2 ASCII width(s),
while people normally make the ruler below same size as the previous line.

E.g.

MMNNMMNNMMNN
===============

MM or NN stands for a single chinese character, if asciidoc reads it correctly
(I mean in unicode with utf-8 as the codec), the actual size of the above
wide string (or unicode string, whatever) is 6, while the "=" symbol below
that is 12, so it should really satisfy the potential requirement of the number
of "=" symbol in the ruler below being greater than the string size above it.

But if the asciidoc does not read this correctly, the UTF-8 decoding of a
chinese character will normally be 3 bytes, and the above example will
turn out to be 18 bytes, thus making asciidoc believes the number of
"=" symbol is not enough.

Ignoring UTF-8 encoding in the input file could be a disaster in some
cases, playing tricks may solve some issue, but not all. So I really
hope the unicode be supported in the future, especially that python
now already comes with good unicode support.


2008/2/19 Hick <hick@...>:

> hehe, maybe I know what's the reason.
>
> Just reduce the chinese character or make the underline longer.
>
>
>
> ------- in chinese
>
> 我开始也遇到了这个问题, 好象是因为"分级标题长度"跟标记的下划线长度必须有一定的关系.  算英文字符的话, 好象是下划线长度必须不小于标题长度.
> 而一个中文字符算俩英文.
>
> 下划线太长了不爽, 我后来使用 = 标题 = 这种方式就没有问题了
>
> ------------------ 原始邮件 ------------------
>
> 发件人: "Eric Miao"<eric.y.miao@...>;
> 发送时间: 2008年2月19日(星期二) 下午05:54
> 收件人: "Yannick Gingras"<ygingras@...>;
> 抄送: "asciidoc-discuss"<asciidoc-discuss@...>;
> 主题: Re: [asciidoc-discuss] Unicode or UTF-8 encoded file support?
>
>
>
> Yannick Gingras wrote:
> > "eric miao" <eric.y.miao@...> writes:
> >
> >
> >> Yet I tried asciidoc on many utf-8 files, but all failed to produce
> >> something. Python should have included Unicode and many
> >> codec support, I'm just wondering how much effort there will
> >> be to support unicode and utf-8 encoding for asciidoc?
> >>
> >
> > Unicode is a top priority of the 9.0 line.  Can you share a file that
> > failed to produce output?
> >
> >
> As in the attached file. It comes from the 8.2.4 asciidoc release.
> I modified the first line into chinese using UTF-8 encoding. Edited
> with vim for windows.
>
> Thanks!
>
> Cheers
> - eric
>
>



--
Cheers
- eric
_______________________________________________
asciidoc-discuss mailing list
asciidoc-discuss@...
http://lists.metaperl.com/cgi-bin/mailman/listinfo/asciidoc-discuss

 « Return to Thread: Unicode or UTF-8 encoded file support?