possible UTF-8 buglet

View: New views
3 Messages — Rating Filter:   Alert me  

possible UTF-8 buglet

by Jeff Breidenbach :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm seeing a little bit of weird behaviour in the Tamil lanaguage.
Check out the subject line on the message page, which is fine. Versus
the same subject line on the index page, which is rendering
incorrectly due to a UTF-8 character being split by whitespace. (This
is the lastest message from 2009/03/21). Basically, we get different
results between $SUBJECTNA$ and $SUBJECT$, with the former being
correct. The more-or-less raw message can be obtained from the
Pipermail archive, or I can supply it.

http://www.mail-archive.com/ubuntu-l10n-tam@.../msg00617.html
http://www.mail-archive.com/ubuntu-l10n-tam@.../maillist.html
https://lists.ubuntu.com/archives/ubuntu-l10n-tam/

<TextEncode>
utf-8; MHonArc::UTF8::to_utf8; MHonArc/UTF8.pm
</TextEncode>

<LiTemplate>
<li>
<span class="date">$YYYYMMDD$</span>
<span class="subject">$SUBJECT$</span>
<span class="sender">$FROMNAME$</span>
</LiTemplate>

<SubjectHeader>
                <div class="msgHead">
                        <h1><span class="subject">$SUBJECTNA:200$</span></h1>
                        <p><span class="sender">$FROMNAME:200$</span><br>
                        <span class="date">$DATE:200$</span></p>
                </div>
</SubjectHeader>


# perl -v
This is perl, v5.8.8 built for x86_64-linux-gnu-thread-multi

# mhonarc -v
  MHonArc v2.6.16+ (Perl 5.008008 linux)


Re: possible UTF-8 buglet

by Earl Hood :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On March 22, 2009 at 14:43, Jeff Breidenbach wrote:

> I'm seeing a little bit of weird behaviour in the Tamil lanaguage.
> Check out the subject line on the message page, which is fine. Versus
> the same subject line on the index page, which is rendering
> incorrectly due to a UTF-8 character being split by whitespace. (This
> is the lastest message from 2009/03/21). Basically, we get different
> results between $SUBJECTNA$ and $SUBJECT$, with the former being
> correct. The more-or-less raw message can be obtained from the
> Pipermail archive, or I can supply it.
>
> http://www.mail-archive.com/ubuntu-l10n-tam@.../msg00617.html
> http://www.mail-archive.com/ubuntu-l10n-tam@.../maillist.html

I see no difference in the message subject in question.

I first installed Tamil fonts on my linux system so the text will
render using a proper font.  I can see no difference.

However, since I know nothing of Tamil and not trusting that my
eye can accurately determine any differences, I extracted the text
from the index file and the message file and did a cmp on them: no
difference.  I.e.  I did a byte-for-byte comparison of the subject
text and there is no difference.

Could you elaborate more on what difference you are actually seeing?

Could the problem be due to how your browser/display is configured?
Do you have the proper fonts installed?

If you can provide a message the generates a byte-for-byte difference
for the same subject text, I can examine further.

--ewh


Parent Message unknown Re: possible UTF-8 buglet

by Jeff Breidenbach :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I upgraded from Ubuntu 8.10 to 9.04 and no longer see any visual
difference for Tamil in Firefox. I guess it was a now-fixed Firefox
issue all along.

On Sun, Mar 22, 2009 at 10:45 PM, Jeff Breidenbach <jeff@...> wrote:
> I'm attaching screenshots of what I am seeing - Firefox is claiming
> some whitespace (it acts like tab when I play around with highlighting
> in "view source"). On the other hand, I just repeated your byte
> comparison experiment and they look identical. Sorry! This is very
> confusing.