RTL support

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

RTL support

by Gregg Reynolds :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

Regarding support for Arabic, Hebrew, etc. there are three elements:

   a.  RTL layout
   b.  Shaping
   c.  Bidi reordering

For myself, support for the first two, without bidi reordering, would be
  fantastic.  Usually when I work in Arabic, I don't need bidi support.
  IMO it should be considered an add-on.  Actually I think the way Vim
does it is the way to go - each feature can be dis/enabled independently.

So how hard would it be to add plain RTL layout first, and then Arabic
shaping?

thanks,

gregg


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Re: RTL support

by Eli Zaretskii :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Date: Tue, 22 Nov 2005 11:05:43 -0600
> From: Gregg Reynolds <gar@...>
>
> Regarding support for Arabic, Hebrew, etc. there are three elements:
>
>    a.  RTL layout
>    b.  Shaping
>    c.  Bidi reordering

No, there are only two: shaping and Bidi reordering.  The former is
relevant for Arabic scripts alone, AFAIK (Hebrew certainly doesn't
need that, but I'm not sure whether there are scripts besides Arabic
that need it).

RTL layout is an integral part of bidi reordering.

> For myself, support for the first two, without bidi reordering, would be
>   fantastic.  Usually when I work in Arabic, I don't need bidi support.

I don't speak Arabic, so I cannot say how useful it is to have
right-to-left display without bidi reordering.  I _can_ tell you that
bidi support is a must for Hebrew (because numbers should be displayed
left to right), and the way Arabic related bidi features are specified
in the Unicode Bidirectional Algorithm makes me wonder how come such
an elaborate scheme (more complex than the scheme used in Hebrew) was
invented if Arabic can be written without it.

> So how hard would it be to add plain RTL layout first, and then Arabic
> shaping?

I have no idea.  I certainly am not going to work on RTL without bidi.
If you want partial bidi support, you might try the m17n version or
hebeng.el (which you could hack to support Arabic).

As for Arabic shaping, I think some work was or is planned in the
Unicode branch.  You may wish to try searching the emacs-devel
archives for suitable keywords, I think Kenichi Handa posted some
messages there in the past.


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Parent Message unknown Re: RTL support

by Gregg Reynolds :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Benjamin Riefenstahl wrote:

> Hi Eli, Gregg,
>
> To amplify a bit:
> >
>>I _can_ tell you that bidi support is a must for Hebrew (because
>>numbers should be displayed left to right),
>
> It's the same in Arabic.  Even the traditional Arabic version of the
> digits (what are called "Indic" digits in Arabic) are written from
> left to right.  Not to mention that in modern Arabic (as in Hebrew, I
> think) there are often words in Latin script interspersed in the text,
> one just needs to think of trademarks.
>

Well, actually, the notion that Arabic/Hebrew et al. is bidirectional is
a bit of brokenness inherited by Unicode.  So we're stuck with it,
obviously, no matter how stupid it is.  It's nice to have, if you need
to mix languages, but if you are working monolingually its unecessary.

So bidi algorithm support is a must for interpreting/generating Unicode
and other legacy *encodings*; it is most certainly not a must for RTL
*text*.  Vim has no support for the bidi algorithm; I use it frequently
to work in Arabic, with no ill effect.  Hebrew is no different.

The point being simply that RTL layout makes for a perfectly usable
editor with or without bidi support.

-gregg


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Re: Re: RTL support

by Behdad Esfahbod :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, 22 Nov 2005, Gregg Reynolds wrote:

> The point being simply that RTL layout makes for a perfectly usable
> editor with or without bidi support.

In your opinion.

> -gregg

--behdad
http://behdad.org/

"Commandment Three says Do Not Kill, Amendment Two says Blood Will Spill"
        -- Dan Bern, "New American Language"


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Re: RTL support

by Gregg Reynolds :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Eli Zaretskii wrote:
>>Date: Tue, 22 Nov 2005 11:05:43 -0600
>>From: Gregg Reynolds <gar@...>
>>
...
> in the Unicode Bidirectional Algorithm makes me wonder how come such
> an elaborate scheme (more complex than the scheme used in Hebrew) was
> invented if Arabic can be written without it.
>
1.  It was legacy, so Unicode had so support it.  Then they went berserk
with it.
2.  Whoever made that first fateful design mistake either didn't
understand what he was doing, or else designing in the service of the
Arabic/Hebrew/etc speaking community was not a priority (making Western
software work for those languages cheaply was most likely the
motivation, hence the desire to avoid handling LSD-first digits.  But
that's just my speculation.)

>
>>So how hard would it be to add plain RTL layout first, and then Arabic
>>shaping?
>
>
> I have no idea.  I certainly am not going to work on RTL without bidi.
> If you want partial bidi support, you might try the m17n version or
> hebeng.el (which you could hack to support Arabic).
>
> As for Arabic shaping, I think some work was or is planned in the
> Unicode branch.  You may wish to try searching the emacs-devel
> archives for suitable keywords, I think Kenichi Handa posted some
> messages there in the past.
>

Thanks, I'll take a look.

-gregg



_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Parent Message unknown Re: RTL support

by Gregg Reynolds :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Benjamin Riefenstahl wrote:
> Hi Gregg,
>
Hi Benny,

Thanks for your reasoned reply.  Comments below.

>
> Gregg Reynolds writes:
>
>>1.  It was legacy, so Unicode had so support it.  Then they went
>>    berserk with it.
>
>
> From my POV, there are very good reasons to consistently encode
> characters in the order in which they are written.  You don't want
> visual layout for any other operation except display.  You might think
> that display is the most important operation on text, but for large
> bits of most software it isn't.

Two things.  One is, directionality a design choice, not a reflection of
some kind of objective reality.  This is obvious if you stare at some
RTL text and think for a while.  However, the Unicode book claims that
RTL languages are "inherently" bidirectional.  This is hogwash.

Second, "the order in which [characters] are written" is not relevant to
an encoding model.  There is no necessary relationship between the IO
model implemented by an application and the corresponding textual
representation, which is application independent.  Specifically, your
editor can support data entry of digit strings as either LSD-first or
MSD-first, or both.  Neither data entry protocol has anything to do with
the way the data is encoded in persistent storage.  For that matter, the
internal encoding of an editor is independent of the data exchange
formats it im/exports.  Emacs being a great example of that.

In other words "reasons to consistently encode characters in the order
in which they are written"  is essentially meaningless.  (I say that as
a statement of fact, not as a flame.)

>
> You might think that RTL without bidi would be enough.  But once you
> have RTL, it becomes the job of the Unicode standard to define how
> mixed content is handled.  Mixed content is after all the driving
> force for Unicode in the first place.  I also think that most users

Hmm.  I think that's debatable.  I think unification of diverse encoding
schemes is the primary driver behind Unicode, but that's a digression.
More important is that RTL has no necessary relationship to mixed
content or bidi reordering.  If you only ever write documents in Arabic
(Hebrew, Persian, Pashto, whatever) then why do you need bidi?  You
don't; it's an unfortunate artifact of Western-driven standardization.

To be clear:  monolingual Arabic text is not mixed content, whether it
contains digit strings or not.  So why should an Arabic user pay the
Unicode tax of bidi support?

Don't get me wrong, I'm not saying the bidi algorithm is not useful or
nice to have.  But it's an add-on, not needed by the vast majority of
RTL documents produced in the world.  Yes, believe it or not, Arabs and
other RTL users actually don't need English, any more than we English
speakers need Arabic.  To this day, scholarly writings about Arabic in
English use transliteration.  Arabic is quite capable of the same, even
for acronyms like IBM or CIA.

It boils down to an economic argument.  For Arabic, we need a) RTL
layout (a purely graphical matter); and b) shaping.  Both of these are
(relatively) inexpensive to implement.  Support for bidi reordering is a
nice enhancement, but it's a) expensive; and b) unecessary unless you
write in two or more languages in the same doc.

Ask yourself a simple question.  Software like Emacs has been around for
what, 30 years?  It gained support for e.g. Japanese, Korean, etc. years
ago.  But the 1 billion + people in the world who need RTL support are
still waiting.  Why is that?  IMHO, it's at least partially because of
the perceived but false association of RTL and bidi.  (I can cite
specific examples of vendors declining to support Arabic solely because
of the expense of implementing bidi support.)  The bidi algorithm is
complex and generally yucky.  Thought experiment:  imagine a world in
which nobody would implement English language software unless it had
bidi support.

Sincerely,

-gregg


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Re: Re: RTL support

by Eli Zaretskii :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Date: Tue, 22 Nov 2005 16:13:58 -0600
> From: Gregg Reynolds <gar@...>
> Cc: emacs-bidi@...
>
> Benjamin Riefenstahl wrote:
> > Hi Eli, Gregg,

Benjamin, I consistently don't see your messages on the list.  Do you
know why is that?

> The point being simply that RTL layout makes for a perfectly usable
> editor with or without bidi support.

Not for me, nor for most of the users of bidi languages.  So you are
obviously in minority here.

I responded to your other points elsewhere in this thread.


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Re: Re: RTL support

by Eli Zaretskii :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Date: Tue, 22 Nov 2005 22:07:53 -0600
> From: Gregg Reynolds <gar@...>
> Cc: emacs-bidi@...
>
> >>1.  It was legacy, so Unicode had so support it.  Then they went
> >>    berserk with it.
> >
> >
> > From my POV, there are very good reasons to consistently encode
> > characters in the order in which they are written.  You don't want
> > visual layout for any other operation except display.  You might think
> > that display is the most important operation on text, but for large
> > bits of most software it isn't.
>
> Two things.  One is, directionality a design choice, not a reflection of
> some kind of objective reality.

That's true, and we decided here long time ago to store characters in
the logical order in Emacs buffers.  The reasons were not only that
most other software in the world made the same decision (and thus if
we want to be able to import text from outside we are better off with
logical order), but also which way would make common Emacs operations,
like searching, easier.

It is pointless to try to convince us now to change that design
decision.  Even if you come up with VERY convincing arguments (which
you didn't, as everything you wrote was on our table when we discussed
this back then), it will be a very hard job to make us revert that
decision.

> In other words "reasons to consistently encode characters in the order
> in which they are written"  is essentially meaningless.

They are not meaningless, they describe a conscious design decision
that was made after much discussion and deliberations.  We came to the
conclusion that logical-order storage will make the rest of bidi
support easier.

> It boils down to an economic argument.  For Arabic, we need a) RTL
> layout (a purely graphical matter); and b) shaping.  Both of these are
> (relatively) inexpensive to implement.  Support for bidi reordering is a
> nice enhancement, but it's a) expensive; and b) unecessary unless you
> write in two or more languages in the same doc.

This is only true if we accept your assumption that text should be
stored within Emacs in visual order.  And we already rejected that
design.  So for us, bidi reordering during display is a must.

> Ask yourself a simple question.  Software like Emacs has been around for
> what, 30 years?  It gained support for e.g. Japanese, Korean, etc. years
> ago.  But the 1 billion + people in the world who need RTL support are
> still waiting.  Why is that?

Because precious few out of those 1 billion were able or wishing to
help us integrate bidi reordering into Emacs display engine.

> The bidi algorithm is complex and generally yucky.

Nevertheless, I think I succeeded to conquer it for Emacs.
Unfortunately, I ran out of free time soon after that, so I need help
in getting this to a working, reliable support.  Then this support
could be extended by others to make Emacs bidi editor.

> Thought experiment: imagine a world in which nobody would implement
> English language software unless it had bidi support.

Such arguments are fruitless now, when the design decisions were made
long ago.


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Re: Re: RTL support

by Gregg Reynolds :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Eli Zaretskii wrote:

>>Date: Tue, 22 Nov 2005 16:13:58 -0600
>>From: Gregg Reynolds <gar@...>
>>Cc: emacs-bidi@...
>>
>>Benjamin Riefenstahl wrote:
>>
>>>Hi Eli, Gregg,
>
>
> Benjamin, I consistently don't see your messages on the list.  Do you
> know why is that?
>
>
>>The point being simply that RTL layout makes for a perfectly usable
>>editor with or without bidi support.
>
>
> Not for me, nor for most of the users of bidi languages.  So you are
> obviously in minority here.
>
Well, I for one am very sorry that a simple question had prompted such a
strange series of responses.  I have no idea why you bring up
majority/minority.  I had no idea that you were competent to speak on
behalf of "most users of bidi languages", especially since there is no
such thing as a "bidi language".  I really just wanted to know something
about RTL support in Emacs.  Obviously this is not the right place.

Please don't respond.  This thread has gone far enough.


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Re: RTL support

by Uwe Brauer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>>>> "Eli" == Eli Zaretskii <eliz@...> writes:



   Eli> That's true, and we decided here long time ago to store
   Eli> characters in the logical order in Emacs buffers.  The reasons
   Eli> were not only that most other software in the world made the
   Eli> same decision (and thus if we want to be able to import text
   Eli> from outside we are better off with logical order), but also
   Eli> which way would make common Emacs operations, like searching,
   Eli> easier.

Right, although this is now a long time ago, I recall I gave some lisp
implementation a try in which a visual method was used. Even
Ehud version (which was by far the best around) did not do line
breaking very well. I tried to set up my own and did not succeed in
anything really usable, so I am not sure whether it is that easy at it
least not at the lisp level. May be if it is done correctly on the C
level it is different.



_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Re: Re: RTL support

by Eli Zaretskii :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Date: Tue, 22 Nov 2005 22:48:02 -0600
> From: Gregg Reynolds <gar@...>
> CC: b.riefenstahl@..., emacs-bidi@...
>
> Well, I for one am very sorry that a simple question had prompted such a
> strange series of responses.

What is strange about it? that we all disagreed with you?

> I have no idea why you bring up majority/minority.  I had no idea
> that you were competent to speak on behalf of "most users of bidi
> languages"

I wasn't speaking on behalf of anyone.  I merely told you what we
decided in this forum when we discussed the design of Emacs support
for languages that need bidirectional editing.  Here, ``we'' means
``all those who were interested enough in Emacs support for bidi to
participate, and who knew enough about Emacs internals to contribute
to that discussion''.

So obviously the views and opinions expressed here are only in the
context of Emacs design, they do not pretend to be broader than that,
even if the specific wording might indicate otherwise.  This is, after
all, "Emacs bidi" mailing list, no more, no less.

> I really just wanted to know something about RTL support in Emacs.
> Obviously this is not the right place.

This _is_ the right place.  I tried to answer your questions, sorry if
I failed.


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Re: RTL support

by "Martin J. Dürst" :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

At 07:26 05/11/23, Gregg Reynolds wrote:

 >1.  It was legacy, so Unicode had so support it.  Then they went berserk
with it.
 >2.  Whoever made that first fateful design mistake either didn't
understand what he was doing, or else designing in the service of the
Arabic/Hebrew/etc speaking community was not a priority (making Western
software work for those languages cheaply was most likely the motivation,
hence the desire to avoid handling LSD-first digits.  But that's just my
speculation.)

Well, Unicode is of course about encoding all scripts of the
world, whatever the direction. It seems extremely obvious that
in that context, you'd try to come up, or adopt, a solution
that didn't only allow each script to work on it's own, but
also different scripts together. The final algorithm is
probably more complex than it really needed to be, but that's
similar for most standards. Calling it 'berserk' doesn't help
in my view.

Regarding LSD (least significant digit) first, that's of course
the crucial point. If you say that making Western software
work for RTL languages cheaply was the motivation for the
bidi algorithm, and for making RTL languages inherently bidi,
then you seem to say that implementing LSD first is even more
difficult/expensive than implementing bidi. I'd probably have
to agree with that: While the technical details of a single
LSD-first number are much easier, making sure that everybody
in the world always knows which numbers are MSD-first and
which numbers are LSD-first would be a very expensive nightmare.
Messing up things like 123 and 321 can easily get expensive.
Having text, rather than numbers, run the wrong way at times,
doesn't look better, but is much better re. error detection.


Regards,    Martin.



_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Re: RTL support

by Gregg Reynolds :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Martin Duerst wrote:

> At 07:26 05/11/23, Gregg Reynolds wrote:
>
>>1.  It was legacy, so Unicode had so support it.  Then they went
> berserk with it.
>>2.  Whoever made that first fateful design mistake either didn't
> understand what he was doing, or else designing in the service of the
> Arabic/Hebrew/etc speaking community was not a priority (making Western
> software work for those languages cheaply was most likely the
> motivation, hence the desire to avoid handling LSD-first digits.  But
> that's just my speculation.)
>
> Well, Unicode is of course about encoding all scripts of the
> world, whatever the direction. It seems extremely obvious that
> in that context, you'd try to come up, or adopt, a solution
> that didn't only allow each script to work on it's own, but
> also different scripts together. The final algorithm is
> probably more complex than it really needed to be, but that's
> similar for most standards. Calling it 'berserk' doesn't help
> in my view.
>
> Regarding LSD (least significant digit) first, that's of course
> the crucial point. If you say that making Western software
> work for RTL languages cheaply was the motivation for the
> bidi algorithm, and for making RTL languages inherently bidi,

No, I was speculating that that might have had something to do with
modeling RTL digit strings as MSD-first.  Without that, you have
problems with math routines.  If we were starting from scratch today
that might not be a big problem, but in the 50s and 60s processor time
was hugely expensive, and most (business) computing was bean-counting.
There were probably good economic reasons at the time in favor of the
MSD-first design.  But that's idle speculation.

> then you seem to say that implementing LSD first is even more
> difficult/expensive than implementing bidi. I'd probably have

Not at all; only with respect to functions etc. that interpret digit
strings as numbers.

> to agree with that: While the technical details of a single
> LSD-first number are much easier, making sure that everybody
> in the world always knows which numbers are MSD-first and
> which numbers are LSD-first would be a very expensive nightmare.
> Messing up things like 123 and 321 can easily get expensive.
> Having text, rather than numbers, run the wrong way at times,
> doesn't look better, but is much better re. error detection.

Similar arguments were made on the Unicode list not too long ago.  Let's
please not open up that debate here ;), but for what it's worth I never
understood what the worry is.  Personally I don't see any possibility of
confusion, but others clearly do.

-gregg


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Re: RTL support

by Eli Zaretskii :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Date: Thu, 24 Nov 2005 12:37:42 +0900
> From: Martin Duerst <duerst@...>
> Cc: emacs-bidi@...
>
> Regarding LSD (least significant digit) first, that's of course
> the crucial point. If you say that making Western software
> work for RTL languages cheaply was the motivation for the
> bidi algorithm, and for making RTL languages inherently bidi,
> then you seem to say that implementing LSD first is even more
> difficult/expensive than implementing bidi.

Unless I'm missing something, these issues have nothing to do with
what we were discussing.  We weren't discussing how to encode bidi
text in a file or in general; we were discussing how to hold it within
Emacs buffers and strings.  The latter is an internal Emacs matter
that shouldn't bother users at all.  The only valid arguments for how
to store RTL text within Emacs buffers and strings are those which
compare the difficulty of adding bidi support to relevant Emacs
features.  That is, one must speak about Emacs design and structure,
not about anything else.

When we discussed this in the past, the conclusion was that storing
RTL text in the visual order will require bidi-related changes in many
places in Emacs, both in many primitive operations and in application
C and Lisp code.  By contrast, logical-order storage required changes
in a small number of well-isolated parts of low-level code, mainly in
display code and in some of the primitives that translate screen to
buffer position and back.


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Re: RTL support

by Benjamin Riefenstahl :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Gregg,


I am not an Emacs developer, and I don't plan to work on this issue
right now.  I also don't believe that you have brought up new
arguments to change the decisions about how this is to done in Emacs
in the future.  I do think that an occasional check of these ideas is
a good thing, though.

So this exchange is mostly about: "Would I personally find useful
software that worked along the lines hat you suggest."

Gregg Reynolds writes:
> Two things.  One is, directionality a design choice, not a
> reflection of some kind of objective reality.

The fact that Arabic and other scripts are written and read from right
to left is a design feature of the script that we can't just ignore
when we implement it in computers, we have to deal with it at some
level.  The question is at *which* level.

> There is no necessary relationship between the IO model implemented
> by an application and the corresponding textual representation,

Exactly.  Which is why Unicode put the complicated parts into the IO
model (for human IO) with BIDI reordering, while any software module
that doesn't have human IO can completely ignore the issue.  The same
goes for most software that directly implements human IO but uses
pre-fabricated building blocks for it (using e.g. GTK or Qt).

If OTOH you use visual ordering in the encoding you make life easier
for a few primitive versions of the IO and complicated for all the
rest of the software.  Not to mention that it makes it even more
complicated for more advanced - read: user-friendly - versions of IO.

There is a third possibility in our case, using visual order within
Emacs and only storing the text in logical order.  That is possible in
a simple text editor (and I am sure there are some of those around).
But Emacs does a lot more, of course.  Every module in Emacs that
needs to look at the logical order would have to make the reordering
anyway.  And as Emacs is about text processing that would probably be
a lot of modules.

That's the choice.  I personally prefer the first way of doing it.

> More important is that RTL has no necessary relationship to mixed
> content or bidi reordering.  If you only ever write documents in
> Arabic (Hebrew, Persian, Pashto, whatever) then why do you need
> bidi?

A large part (maybe still a majority) of the people that write Arabic
and Hebrew on computers write in more than just one language.  This is
even if you discount numbers and trademarks.

> To be clear: monolingual Arabic text is not mixed content, whether
> it contains digit strings or not.  So why should an Arabic user pay
> the Unicode tax of bidi support?

A large part of the user base right now does need mixed content.  So
you would get the tax of supporting several versions of software, the
software for people that don't need mixed content and another version
for people that do.  Even if the first version on its own might be
cheaper, on the whole this will get more costly.  Not to mention that
it would end up in a system where the "natives" get the "stupid"
mono-lingual software and the "experts" and the westerners can afford
the "intelligent" software for the mixed content.


benny



_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Re: RTL support

by Gregg Reynolds :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Benjamin Riefenstahl wrote:
> Hi Gregg,
>

Hi Benny,

>
> I am not an Emacs developer, and I don't plan to work on this issue
> right now.  I also don't believe that you have brought up new
> arguments to change the decisions about how this is to done in Emacs
> in the future.  I do think that an occasional check of these ideas is
> a good thing, though.

Fair enough.  (However, my original post wasn't intended to be
argumentative, but to ask about some specific design options.  So to be
clear, I don't mean to advocate any particular option at this point,
since I don't know enough about Emacs internals.  The general
observation that graphical layout, text reordering (via bidi or any
other algo), and shaping are mutually orthogonal applies generally to
any notion of text processing.

>
> So this exchange is mostly about: "Would I personally find useful
> software that worked along the lines hat you suggest."
>
Yep; this is my itch.  I happen to think scratching it would benefit
many others, but of course that is (informed) speculation.

> Gregg Reynolds writes:
>
>>Two things.  One is, directionality a design choice, not a
>>reflection of some kind of objective reality.
>
>
> The fact that Arabic and other scripts are written and read from right
> to left is a design feature of the script that we can't just ignore
> when we implement it in computers, we have to deal with it at some
> level.  The question is at *which* level.

Sorry, I wasn't clear.  I mean that modeling a single script/language as
mono- or bi-directional is a design choice, not a statement of a Law of
Nature.  This might be better expressed by saying the choice of number
polarity - MSD or LSD first in strings - is a design choice.  One could
model English text with LSD-first digit strings if one wanted.

This means, among other things, that "RTL" does not imply "bidi", any
more than "LTR" does.  RTL/LTR refers solely to graphical syntax, not to
an encoding model.

>
>>There is no necessary relationship between the IO model implemented
>>by an application and the corresponding textual representation,
>
>
> Exactly.  Which is why Unicode put the complicated parts into the IO
> model (for human IO) with BIDI reordering, while any software module
> that doesn't have human IO can completely ignore the issue.  The same

I don't understand what you say here.   Unicode as I understand it
doesn't have anything at all to say about IO; it just defines character
semantics and syntax (accent after base char, etc.)  Note that there are
no complicated parts for monodirectional text.  It's the bidi
requirement itself that creates the complication.

Another clarification:  I'm not arguing against bidi support where it is
truly needed, namely in mixed language texts.  Nor am I arguing that
Emacs should not have bidi support - it should, obviously.

I guess the point is that we can get there in stages.  First you
implement RTL layout, then shaping, then bidi.  That way we have
*usable* software without having to wait for bidi support, and
eventually we do have full bidi support.  Vim provides the model: you
can switch on/off RTL layout and Arabic shaping independently; hopefully
someday somebody will add bidi support too.  But in the meantime it is
very useful for working with Arabic text.  I'd simply like for Emacs to
be as useful, since I'm firmly in the Emacs camp when it comes to editors.

> goes for most software that directly implements human IO but uses
> pre-fabricated building blocks for it (using e.g. GTK or Qt).
>
> If OTOH you use visual ordering in the encoding you make life easier
> for a few primitive versions of the IO and complicated for all the
> rest of the software.  Not to mention that it makes it even more
> complicated for more advanced - read: user-friendly - versions of IO.

I don't see how.  Can you provide an example of how this would make
things more complicated?  I mean other than with math routines.  That I
admit is the big problem.  There are ways around it, but that's for
another thread.
>
> There is a third possibility in our case, using visual order within
> Emacs and only storing the text in logical order.  That is possible in
> a simple text editor (and I am sure there are some of those around).
> But Emacs does a lot more, of course.  Every module in Emacs that
> needs to look at the logical order would have to make the reordering
> anyway.  And as Emacs is about text processing that would probably be

I don't see why.  Example?

> a lot of modules.
>
> That's

I think we're talking about two separate things.  In my opinion, the
internal encoding used by Emacs is irrelevant, so long as I know what it
is.  I just want RTL layout and Arabic shaping, both of which simply
operate on a string of chars/glyphs.

Actually, even when Emacs has full bidi support, I would still want a
"transparent" mode that will provide a graphical representation of the
true (physical) ordering of the text.

The question of how best to represent text internally is an interesting
one, but I haven't given it much thought.  I do think Emacs did the
right thing by *not* adopting Unicode as its internal representation.

 the choice.  I personally prefer the first way of doing it.

>
>
>>More important is that RTL has no necessary relationship to mixed
>>content or bidi reordering.  If you only ever write documents in
>>Arabic (Hebrew, Persian, Pashto, whatever) then why do you need
>>bidi?
>
>
> A large part (maybe still a majority) of the people that write Arabic
> and Hebrew on computers write in more than just one language.  This is
> even if you discount numbers and trademarks.

Yes, I've heard this claimed many times, but I've never seen any
evidence to back it up.  My personal experience is that it is simply not
true.  In the Arab world, at least, *most* people do *not* operate in
multiple languages (just like in the US), and from what I've personally
seen they get along fine using Arabic only on a computer, just as most
Americans get along fine using English only.  Even scholarly articles
written in English about Arabic generally use transliteration.  Things
are no different in the Arab world.  When newspapers need to write "CNN"
or "FBI", they transliterate it.  Then need for full mixed directional
support is quite specialized, probably everywhere in the world.

Add to that the fact that multilanguage computing w/out bidi support is
quite feasible.  I do it all the time using Vim and even Emacs.


>
>
>>To be clear: monolingual Arabic text is not mixed content, whether
>>it contains digit strings or not.  So why should an Arabic user pay
>>the Unicode tax of bidi support?
>
>
> A large part of the user base right now does need mixed content.  So

That may be true for the *current* emacs user base.  Then again, Emacs
has no RTL user base, since Emacs doesn't support RTL.  Whether or not
the potential RTL user base truly needs multilanguage (mixed
directionality) support is a matter of speculation.  But we *know* that
they need RTL layout and shaping, and we also know that RTL layout and
shaping is sufficient to make software useful.

Besides, to me the user base is everybody in the world.  Whoever wants
to use it, should be able to use it.  Lack of bidi support need not
prevent the software from being useful for people who don't need bidi
support.

> you would get the tax of supporting several versions of software, the
> software for people that don't need mixed content and another version
> for people that do.  Even if the first version on its own might be
> cheaper, on the whole this will get more costly.  Not to mention that
> it would end up in a system where the "natives" get the "stupid"
> mono-lingual software and the "experts" and the westerners can afford
> the "intelligent" software for the mixed content.

I guess I wasn't clear - see my note above.  As you note, it wouldn't
make much sense to support two RTL versions of a piece of software, one
with and one without bidi support.  But there would be no reason to do
so; RTL w/out bidi would just be a stage on the way to full bidi
implementation.

It's interesting that you perceive the "intelligent" software as the
stuff with bidi support.  In my experience it is just the opposite:
editors with Unicode bidi support are really stupid, from the end user
point of view.  They are often almost impossible to use, thanks to
bizarro cursor behaviour and the directional ambiguity Unicode
explicitly assigns to characters like puncuation, parens, etc.  I find
Vim much simpler and more user-friendly.

Somewhere in the GCC list archives there's a note from RMS in response
to an issue involving support for some obscure feature of the ISO C++
standard (if I recall correctly), in which he says it all in a very few
words, something along the lines of "Standards are recommendations; we
should design to meet the needs of our community; if the Standard helps
with that, then we support it, but if not we shouldn't hesitate to
ignore it and do what is best for the community."  Software support for
Unicode RTL scripts provides a classic example of getting things
backwards - designing to satisfy the standard instead of community needs.

In summary, there's more than one way to skin a cat, as the (American)
saying goes.  Emacs (and other software) can be quite useful to RTL
users without bidi support.  It's better to have bidi support,
naturally, but the cost if bidi implementation need not stand in the way
of providing useful stuff, and providing useful stuff by supporting
non-bidi RTL and shaping need not inhibit implementation of bidi support.

thanks,

gregg


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Re: RTL support

by Eli Zaretskii :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Date: Fri, 25 Nov 2005 11:24:19 -0600
> From: Gregg Reynolds <gar@...>
> CC: emacs-bidi@..., Eli Zaretskii <eliz@...>
>
> I guess the point is that we can get there in stages.  First you
> implement RTL layout, then shaping, then bidi.  That way we have
> *usable* software without having to wait for bidi support, and
> eventually we do have full bidi support.

>From what I understand, users of RTL languages want almost full bidi
support, they won't settle for RTL display alone.

> Vim provides the model

I never saw anyone use Vim for Hebrew text---that I can tell you.
Hebrew text without reordering is going backwards to the age of
typewriters when secretaries needed to learn to type numbers and other
LTR text backwards.

> But in the meantime it is very useful for working with Arabic text.

Again, users of Arabic scripts I talked to think otherwise: they want
bidi support, not just RTL display.

> > There is a third possibility in our case, using visual order within
> > Emacs and only storing the text in logical order.  That is possible in
> > a simple text editor (and I am sure there are some of those around).
> > But Emacs does a lot more, of course.  Every module in Emacs that
> > needs to look at the logical order would have to make the reordering
> > anyway.  And as Emacs is about text processing that would probably be
>
> I don't see why.  Example?

The simplest example would be incremental search.  Suppose you type
"C-s ABCD", where upper-case letters denote RTL characters (e.g.,
Arabic letters).  If text is stored in the buffer in visual order,
Emacs will have to reorder ABCD into DCBA before passing it to the
text-search primitive.

Likewise, any other Emacs function that receives input from the user
or from external applications will need to do the reordering from
logical to visual order.  In other words, many places in Emacs will
need to be changed to handle visual-to-logical reordering.  This would
make the job of adding RTL support to Emacs unbearably hard.

In addition, while logical-to-visual reordering is a well-defined
operation, whereby for every logical-order string there's one and only
one visual-order string, the reverse is not true: for some
visual-order strings one can find more than one logical-order string
that, when reordered according to Unicode Bidirectional Algorithm,
will all give the original visual-order string.

> I just want RTL layout and Arabic shaping, both of which simply
> operate on a string of chars/glyphs.

But the Emacs developers want what the users of RTL languages want,
and that is bidi support, not just RTL display.  So any work done for
RTL support must be consistent with, and a part of, the full bidi
support, otherwise we, the Emacs developers, will object to including
it.

> The question of how best to represent text internally is an interesting
> one, but I haven't given it much thought.

We did give it much thought, and the conclusion was to use the logical
order.  That's why we must have bidi reordering in the display code.

> > A large part (maybe still a majority) of the people that write Arabic
> > and Hebrew on computers write in more than just one language.  This is
> > even if you discount numbers and trademarks.
>
> Yes, I've heard this claimed many times, but I've never seen any
> evidence to back it up.  My personal experience is that it is simply not
> true.  In the Arab world, at least, *most* people do *not* operate in
> multiple languages (just like in the US), and from what I've personally
> seen they get along fine using Arabic only on a computer, just as most
> Americans get along fine using English only.

As I and others pointed out here, even Arabic-only text needs bidi
reordering because of digits and other weak and neutral characters.
The only way to avoid this reordering is to store characters within
Emacs buffers and strings in visual order, which we decided not to do,
for reasons explained above.

> In summary, there's more than one way to skin a cat, as the (American)
> saying goes.  Emacs (and other software) can be quite useful to RTL
> users without bidi support.  It's better to have bidi support,
> naturally, but the cost if bidi implementation need not stand in the way
> of providing useful stuff, and providing useful stuff by supporting
> non-bidi RTL and shaping need not inhibit implementation of bidi support.

This is Free Software developed by volunteers.  And all the volunteers
we have that know about and use RTL languages unanimously decided they
wanted RTL support with bidi reordering.  It is okay for you to
disagree, but the only method to steer Emacs development your way
would be to submit code changes to add simple RTL display without bidi
reordering to Emacs.  If you write such code, and it is clean and
doesn't get in the way of the future bidi reordering support, I
promise you I will review the code and recommend it for inclusion.

But as long as you leave this job to us, we will do what we think is
right, and that is RTL with bidi reordering support.


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Re: Re: RTL support

by Omer Zak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 2005-11-25 at 11:24 -0600, Gregg Reynolds wrote:

> > A large part (maybe still a majority) of the people that write Arabic
> > and Hebrew on computers write in more than just one language.  This is
> > even if you discount numbers and trademarks.
>
> Yes, I've heard this claimed many times, but I've never seen any
> evidence to back it up.  My personal experience is that it is simply not
> true.  In the Arab world, at least, *most* people do *not* operate in
> multiple languages (just like in the US), and from what I've personally
> seen they get along fine using Arabic only on a computer, just as most
> Americans get along fine using English only.  Even scholarly articles
> written in English about Arabic generally use transliteration.  Things
> are no different in the Arab world.  When newspapers need to write "CNN"
> or "FBI", they transliterate it.  Then need for full mixed directional
> support is quite specialized, probably everywhere in the world.
>
> Add to that the fact that multilanguage computing w/out bidi support is
> quite feasible.  I do it all the time using Vim and even Emacs.

Hello Gregg,
With all respect, your claims are so wrong I do not know where to start.
I live in Israel and I use computers for Hebrew wordprocessing, in
addition to other purposes and other kinds of text processing.

I do not know how it is in Arabic speaking countries, but in Israel,
full BiDi is mandatory in text editors and wordprocessors, which aim at
handling Hebrew.

Even in Hebrew monolingual text, numbers are written in LTR order.  So
you already need a BiDi algorithm.  In addition to this, Latin letters
are frequently used in Hebrew text, especially in articles about
technical topics.

I consider any wordprocessor or editor without full BiDi support to be
broken and useless for editing Hebrew texts.

I use Emacs to edit Web pages for my Web site (http://www.zak.co.il/),
but when a Web page has also Hebrew text, I use gedit instead, because
it has full BiDi support.
                                              --- Omer
--
Delay is the deadliest form of denial.    C. Northcote Parkinson
My own blog is at http://www.livejournal.com/users/tddpirate/

My opinions, as expressed in this E-mail message, are mine alone.
They do not represent the official policy of any organization with which
I may be affiliated in any way.
WARNING TO SPAMMERS:  at http://www.zak.co.il/spamwarning.html



_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Re: Re: RTL support

by Gregg Reynolds :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Omer Zak wrote:
>
> Hello Gregg,
> With all respect, your claims are so wrong I do not know where to start.

I surrender.  How can one possibly argue against an argument as clever
as "you are wrong".  Good work, Omer.


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Re: RTL support

by Benjamin Riefenstahl :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Gregg,


Sorry, this is much too long again :-(

Gregg Reynolds writes:
> This might be better expressed by saying the choice of number
> polarity - MSD or LSD first in strings - is a design choice.  One
> could model English text with LSD-first digit strings if one wanted.

At the encoding level you could.  At the input level you have to
emulate how the users would do it on paper.  If you don't, users will
find that awkward and look for something else.  We don't design
scripts, we just design how they are implemented in computers.

> RTL/LTR refers solely to graphical syntax, not to an encoding model.

RTL refers to how I read, write and line-break a written phrase as a
human being.  Computers have to map this to their notions of graphics
and coordinate-systems.  And than they have to store it in a
convenient form.

> I guess the point is that we can get there in stages.  First you
> implement RTL layout, then shaping, then bidi.

It *is* done in stages.  We have had Hebrew modules in Elisp for ages.
We have emacs-bidi now.  The only thing is that this is not in the
stock Emacs.  The same as support for FE scripts was not in stock
Emacs until the developers thought it was mature enough and somebody
had (or made) the necessary time to integrate it.

This development process takes time.  The developers do it in their
free time, after all, and the issues are complicated.  Doing it in the
stages that you indicated would probably take more time (and require
more overhead in discussing it :-() than just doing it well enough in
fewer steps.

>> A large part (maybe still a majority) of the people that write
>> Arabic and Hebrew on computers write in more than just one
>> language.  This is even if you discount numbers and trademarks.
>
> Yes, I've heard this claimed many times, but I've never seen any
> evidence to back it up.  My personal experience is that it is simply
> not true.  In the Arab world, at least, *most* people do *not*
> operate in multiple languages (just like in the US),

Hm.  In your opinion, native Arab speakers who only work in their own
language are how much of the whole user base for Arabic?  50%?  70%?
90%?

There are westerners studying Arabic, business types (native Arabs and
westerners) doing multi-language word-processing (think of any larger
company), there are programmers, scripters, folks building HTML pages,
even graphical designers that need to incorporate correct Western
elements into their designs, ... I've probably forgotten some
important groups.  All of these work with mixed content.  I'd say
these are probably much more than 30% of the computer users and even
just 10% would IMO be more than enough of a market to justify to
implement mixed content and not start a separate code branch for RTL
only.

> Even scholarly articles written in English about Arabic generally
> use transliteration.

There were times when they didn't.  Now authors regularly apologize to
their readers that they can't do it.  So there is demand, but
publishers currently think it's too expensive.

> Things are no different in the Arab world.  When newspapers need to
> write "CNN" or "FBI", they transliterate it.

As far as I have seen (and you have probably more experience),
newspapers and magazines sometimes do and sometimes don't.  But
advertisements seem to be more likely to use at least their brand
names in Latin script.  And those advertisements targeting the wealthy
probably use English marketing phrases, as they do over here in
Germany, right?

>> A large part of the user base right now does need mixed content.  So
>
> That may be true for the *current* emacs user base.

The current base and the base of the *very* near future is what Emacs
is written for.  Like most software Emacs is not written for an ideal
world.  That would waste too much resources.

Emacs is actually an exception in that it is so easily adapted to
users' needs in most respects, that it has a much better flexibility
than a lot of other software.  This is more difficult with some things
than with others, as this example shows.  And even here we actually
have emacs-bidi now.

> Besides, to me the user base is everybody in the world.  Whoever
> wants to use it, should be able to use it.

In the way that you are framing it, I think that is unrealistic.


benny


PS: Clarifications of my previous post:

>> [...] Unicode put the complicated parts into the IO model (for
>> human IO) with BIDI reordering, while any software module that
>> doesn't have human IO can completely ignore the issue.  The same
>
> I don't understand what you say here.  Unicode as I understand it
> doesn't have anything at all to say about IO; it just defines
> character semantics and syntax (accent after base char, etc.)

Yes, Unicode specifies the encoding level.  But by exclusion those
things that are required by the task at hand, but are not specified in
Unicode itself, have to be done in other levels.  In this way Unicode
works on the basis of an implied architecture.

>>Not to mention that it makes it even more complicated for more
>>advanced - read: user-friendly - versions of IO.
>
> I don't see how.  Can you provide an example of how this would make
> things more complicated?

I spend too much on these posts already to also think of exiciting
examples, too.  Sorry ;-)

What I mean is that with a given visual encoding, if your IO model is
not the exact same as the encoding model for whatever reason, than the
encoding gets in your way in a big way and makes for even more
complicated code (== more bugs, less features).

So mandating a visual (or not-quite-logical) encoding was no realistic
choice for Unicode and is also not realistic for a text-processing
platform such as Emacs.

>> Every module in Emacs that needs to look at the logical order would
>> have to make the reordering anyway.  And as Emacs is about text
>> processing that would probably be a lot of modules.
>
> I don't see why.  Example?

E.g. search and replace.  I have a number of functions that do that to
fix up text automatically.  This works on Emacs' internal text model.
Sometimes I need to find stuff in a certain order, which is of course
the logical order.  If I were processing bidi text (say an XML file
containing Syriac content, that's something that I actually have) and
the text was not in logical order, I'd have to think about it.
Everytime when I code a search I'd have see if the logical/visual
dichotomy has an impact in the particular case.

Let's say only half of the approximately thousand Elisp modules in
stock Emacs have a need for a similar review at a superficial level,
to see if they are compatible with the chosen visual or
not-quite-logical ordering.  That is a lot of work.  With a plain
logical ordering, such a review is probably still needed in some
places, but changes should be necessary much more rarely.



_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi
< Prev | 1 - 2 | Next >