Distinct performance issues with Japanese only on win32 systems

View: New views
8 Messages — Rating Filter:   Alert me  

Distinct performance issues with Japanese only on win32 systems

by David E. Hollingsworth-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Hello,

I have noticed significant performance issues with pango when
displaying Japanese text on win32 systems.  Thus far, I've assumed
that this was a configuration issue, but I've encountered the problem
using the stock gtk+ win32 binaries on what I believe is a standard
msys/mingw setup on both English and Japanese versions of Windows XP.

In addition to using the stock pango, I've tried building pango myself
using different versions, I've checked pango.modules and
pango.aliases, and I've tried various combinations of LANG and Windows
language settings; the numbers below are the best I've obtained for
each over multiple runs.

Here's the sort of timing I'm seeing for pango_layout_check_lines()
for the test files referenced below:

English (10.9k bytes/chars)  Japanese file (12.6k bytes, 4.5k chars)
34ms                         57ms         Linux setup
10ms                        940ms         Windows setup

Obviously the particular values will vary by system, but my machine is
a year-old developer-class machine, and 940ms is long in human terms.

Some brief explorations suggest that the 940ms case involves making
many calls to Uniscribe (each roughly 10ms in cost), but since
Japanese is not a complex script, it is unclear to me why layout would
require Uniscribe at all.

The files referenced above can be found here:

http://www.fastanimals.com/tech/pango/pango-test1.c      (Linux file)
http://www.fastanimals.com/tech/pango/pango-win-test1.c  (Windows file)
http://www.fastanimals.com/tech/pango/udhr-en.txt        (English)
http://www.fastanimals.com/tech/pango/udhr-ja.txt        (Japanese)
http://www.gtk.org/download-windows.html                 (gtk All-in-one bundle)
http://www.mingw.org/wiki/msys                           (msys/mingw)

I would appreciate it if someone else would try compiling
pango-win-test1.c to see if they have similar results.

Thank you for any suggestions on how to proceed,

David E Hollingsworth
deh@...

--
"I've just found the silverware and I'm sticking a fork in that square!" - N.H.
_______________________________________________
gtk-i18n-list mailing list
gtk-i18n-list@...
http://mail.gnome.org/mailman/listinfo/gtk-i18n-list

Re: Distinct performance issues with Japanese only on win32 systems

by David E. Hollingsworth-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Hello,

I have investigated this more.  It does not appear to be a
configuration issue, nor specific to the particular code I was using.
The problem appears to be the way that Pango is using Uniscribe.


If you run gedit for win32 and load a moderate-length (120kB) Japanese
document, it takes several seconds, whereas an English equivalent
document is essentially instantaneous.  For comparison, the Japanese
document loads in under 1s using WordPad or gedit on Linux.

gedit for win32 can be found here:

  http://ftp.gnome.org/pub/gnome/binaries/win32/gedit/

The documents I used were:

  http://www.fastanimals.com/tech/pango/udhr-ja-x10.txt
  http://www.fastanimals.com/tech/pango/udhr-en-x10.txt

If I'm missing something here -- in particular, if someone has an
application that uses Pango on Windows to display CJK texts that's not
suffering from performance problems -- I'd like to hear about it!  It
seems like such problems are unavoidable.


Looking into the basic-win32 module, what's happening is that Pango
appears to use -- when text_is_simple() returns false -- Uniscribe's
full itemize-and-shape algorithm for each Pango item.  I presume it
does this because there's some case where Pango considers something a
single item but Uniscribe breaks it into multiple items, but I don't
know what case that might be.  Anyway, Uniscribe, like Pango, wants to
operate on complete paragraphs, so it wouldn't be surprising if
calling ScriptItemize for each Pango item was slow.

Japanese in particular gets hit hard by this technique because it
makes extensive use of mixed multiple scripts (kana, kanji, and the
ascii characters), so it ends up with a lot of Pango items.
text_is_simple() returns false for kana & kanji characters, because
Uniscribe's ScriptIsComplex() returns S_OK for SIC_COMPLEX for such
characters.

But I haven't been able to figure out the benefit of using Uniscribe
for most CJK texts.  I can think of some cases where Uniscribe might
provide some benefit (vertical substitution, ambiguous-width
characters, combining marks), but it seems like there are many
identifiable situations where Uniscribe isn't adding any benefit.

Anyway, that wouldn't seem to help other languages that do require
Uniscribe but I haven't done performance comparisons to see; perhaps
other language texts don't result in quite as much item fragmentation
as Japanese texts.

  --deh!

--
"I've just found the silverware and I'm sticking a fork in that square!" - N.H.
_______________________________________________
gtk-i18n-list mailing list
gtk-i18n-list@...
http://mail.gnome.org/mailman/listinfo/gtk-i18n-list

Re: Distinct performance issues with Japanese only on win32 systems

by Tor Lillqvist :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> If you run gedit for win32 and load a moderate-length (120kB) Japanese
> document, it takes several seconds, whereas an English equivalent
> document is essentially instantaneous.

Hmm, is this the time it takes to process the entire document through
Pango? Or just one screenful? I,.e, is the time proportional to the
length of the document?

> Looking into the basic-win32 module, what's happening is that Pango
> appears to use -- when text_is_simple() returns false -- Uniscribe's
> full itemize-and-shape algorithm for each Pango item.

It's been a while since I wrote that code... Have you experience of
using Uniscribe? Do you think it would be possible to bypass the
Uniscribe itemization then assuming that Pango's itemization is "good
enough" for Uniscribe, too? Are you able to build Pango for Windows,
can you experiment and come up with a patch?

> I presume it
> does this because there's some case where Pango considers something a
> single item but Uniscribe breaks it into multiple items,

Setting the environment variable PANGO_WIN32_DEBUG will cause lots of
debugging output to be printed to stdout. From that it should be
possible to investigate what is going on, for instance whether
Uniscribe itemizes one Pango item further.

> But I haven't been able to figure out the benefit of using Uniscribe
> for most CJK texts.

It might be worthwhile then to explicitly check for a simple case of
just plain CJK characters and make text_is_simple() explicitly return
true in that case then?

--tml
_______________________________________________
gtk-i18n-list mailing list
gtk-i18n-list@...
http://mail.gnome.org/mailman/listinfo/gtk-i18n-list

Re: Distinct performance issues with Japanese only on win32 systems

by David E. Hollingsworth-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Tor Lillqvist <tml@...> writes:

> Hmm, is this the time it takes to process the entire document through
> Pango? Or just one screenful? I,.e, is the time proportional to the
> length of the document?

It's proportional to the length.  The real test is the C code I
referred to in the previous message.  The gedit "test" was just to
validate that the issue wasn't particular to my C code.


> It's been a while since I wrote that code... Have you experience of
> using Uniscribe? Do you think it would be possible to bypass the
> Uniscribe itemization then assuming that Pango's itemization is
> "good enough" for Uniscribe, too? Are you able to build Pango for
> Windows, can you experiment and come up with a patch?

It turns out that reitemizing the items is not the primary issue.  I
now have some test code that runs as low as 40ms (110ms typical) when
used outside the pango environment but takes roughly 2s when called
from basic_engine_shape().  I don't have an explanation for this yet,
but I suspect it has something to do with state set on the HDC at the
time.  My goal is to create a patch and I'll report back if I get
stuck.


> It might be worthwhile then to explicitly check for a simple case of
> just plain CJK characters and make text_is_simple() explicitly
> return true in that case then?

This does work, and for my sample Japanese texts the generated glyphs
are identical.  But my hypothesis (that the issue was related to
Japanese's use of multiple scripts) is busted: the same problem shows
up with Thai, which is definitely a complex script and requires the
use of Uniscribe.  So this would be a CJK hack and the general problem
would remain.

--
"I've just found the silverware and I'm sticking a fork in that square!" - N.H.
_______________________________________________
gtk-i18n-list mailing list
gtk-i18n-list@...
http://mail.gnome.org/mailman/listinfo/gtk-i18n-list

Re: Distinct performance issues with Japanese only on win32 systems

by David E. Hollingsworth-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


A few weeks ago I reported a performance issue regarding the use of
Uniscribe in the win32 basic shaper.

It turns out that the issue was with the use of Uniscribe's
SCRIPT_CACHE.  Despite the name, this value is for caching
per-font-plus-size values, not per-script values.  See:

  http://msdn.microsoft.com/en-us/library/dd317726(VS.85).aspx

Since shaping occurs on runs containing a single font, we only need
one SCRIPT_CACHE in uniscribe_shape.  I'm including a patch below that
does this.  With the patch applied, the time for one of my tests went
from 20s to .3s.

  --deh!


--- unpatched-pango/modules/basic/basic-win32.c 2009-07-30 23:47:43 +0000
+++ patched-pango/modules/basic/basic-win32.c 2009-08-10 23:34:29 +0000
@@ -581,7 +581,7 @@
 #endif
 
       items[item].a.fRTL = analysis->level % 2;
-      if ((*script_shape) (hdc, &script_cache[script],
+      if ((*script_shape) (hdc, script_cache,
    wtext + items[item].iCharPos, itemlen,
    G_N_ELEMENTS (iglyphs),
    &items[item].a,
@@ -611,7 +611,7 @@
  nglyphs, glyphs->log_clusters + ng,
  char_offset);
 
-      if ((*script_place) (hdc, &script_cache[script], iglyphs, nglyphs,
+      if ((*script_place) (hdc, script_cache, iglyphs, nglyphs,
    visattrs, &items[item].a,
    advances, offsets, &abc))
  {
@@ -673,7 +673,7 @@
   long wlen;
   int i;
   gboolean retval = TRUE;
-  SCRIPT_CACHE script_cache[100];
+  SCRIPT_CACHE script_cache;
 
   if (!pango_win32_font_select_font (font, hdc))
     return FALSE;
@@ -684,11 +684,10 @@
 
   if (retval)
     {
-      memset (script_cache, 0, sizeof (script_cache));
+      memset (&script_cache, 0, sizeof (script_cache));
       retval = itemize_shape_and_place (font, hdc, wtext, wlen, analysis, glyphs, script_cache);
-      for (i = 0; i < G_N_ELEMENTS (script_cache); i++)
- if (script_cache[i])
-  (*script_free_cache)(&script_cache[i]);
+      if (script_cache)
+  (*script_free_cache)(&script_cache);
     }
 
   if (retval)
_______________________________________________
gtk-i18n-list mailing list
gtk-i18n-list@...
http://mail.gnome.org/mailman/listinfo/gtk-i18n-list

Re: Distinct performance issues with Japanese only on win32 systems

by Tor Lillqvist :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Since shaping occurs on runs containing a single font, we only need
> one SCRIPT_CACHE in uniscribe_shape.  I'm including a patch below that
> does this.  With the patch applied, the time for one of my tests went
> from 20s to .3s.

Wow! Thank you very much for investigating. Will apply the patch.

--tml
_______________________________________________
gtk-i18n-list mailing list
gtk-i18n-list@...
http://mail.gnome.org/mailman/listinfo/gtk-i18n-list

Re: Distinct performance issues with Japanese only on win32 systems

by Tor Lillqvist :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Wow! Thank you very much for investigating. Will apply the patch.

I applied the patch already in mid-August to the stable and
development branches, but unfortunately it turns out that after the
patch, shaping of complex scripts is quite broken. Sorry that I didn't
notice earlier. The Arabic in gtk-demo looks horrible, it uses
unconnected letters. Unless I find some quick fix that doesn't affect
the main point in the patch, I will have to revert it.

--tml
_______________________________________________
gtk-i18n-list mailing list
gtk-i18n-list@...
http://mail.gnome.org/mailman/listinfo/gtk-i18n-list

Re: Distinct performance issues with Japanese only on win32 systems

by David E. Hollingsworth :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

My apologies for not running gtk-demo to check before offering the  
patch!

I'm not going to be able to look into it more for a while, so  
reverting may be the right choice at this time.

David E Hollingsworth

_______________________________________________
gtk-i18n-list mailing list
gtk-i18n-list@...
http://mail.gnome.org/mailman/listinfo/gtk-i18n-list