Thoughts on GNUSpeech and possible accessibility applications

View: New views
10 Messages — Rating Filter:   Alert me  

Thoughts on GNUSpeech and possible accessibility applications

by Jason White-14 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

I've been monitoring the archives for a while, but I thought it was time to
subscribe to the list.

I would be interested in helping to contribute beta testing to the project, as
well as any ideas or experience that might be of benefit. As a computer user
who happens to be blind, I have been using speech synthesis since the early
1980s. (These days, I rely more on braille devices, but speech still plays a
significant role).

Although there are excellent free (as in freedom) screen readers and
speech-based user interfaces available for GNU/Linux, such as Emacspeak
(http://emacspeak.sourceforge.net/), SpeakUp (http://www.linux-speakup.org/)
and Orca (http://www.gnome.org/projects/orca/), the quality of free text to
speech systems is, in my judgment at least, somewhat inadequate. To be
specific about this, I haven't heard any free software that even comes close
to competing with the DECTalk synthesizer which is on my desk here. Moreover,
the proprietary speech synthesis systems (available as software only rather
than hardware) for GNU/Linux all incur licencing fees, and owing to the lack
of access to source code, bugs can't be fixed by the developers of screen
readers and speech-based user interfaces, or by users with programming skills.

A high-quality, free, synthesizer could also be integrated by default into
GNU/LInux distributions, and made available in devices that employ free and
open-source software, for example mobile telephones
(http://eyes-free.googlecode.com/ exemplifies the latter, and currently uses
ESpeak as its synthesizer).

Is there interest among participants in the GNUSpeech project in its potential
to support such applications? If so, the porting of the text to speech server
to GNU/Linux would be a necessary prerequisite, but the development
environment would also need to be available to enable the implementation of
additional languages. I am also interested in whether the possible
accessibility applications of the project might help to attract development
resources. I don't know any possible sources of funding or developers at
present, but I would gladly participate in any such discussions.

Since the text to speech system doesn't run under GNU/Linux yet, I haven't
been able to test it. However, the paper and sample files at
http://pages.cpsc.ucalgary.ca/~hill/papers/avios95/menu.htm

were very useful. My initial impression is that I find GNUSpeech difficult to
understand, partly due to the mixture of British phonetics and North American
pronunciation that leads, for example, to pronounced "r" and "l" sounds where
they occur in American, but not British English. However, I like the rhythm
and intonation, which I know from having read the papers are the subject of
substantial research. I don't understand speech synthesis sufficiently to know
whether the quality of the speech could be easily improved by fine-tuning the
dictionaries and databases, making use of the part of speech information, etc.

For the accessibility applications mentioned above, there are other
requirements that would need to be satisfied, and again I would be pleased to
contribute to such discussions in the event that they become relevant to the
project. I am also aware that I am by no means alone in desiring a
high-quality text to speech system suitable for such applications in a Linux
environment, available as free software.



_______________________________________________
gnuspeech-contact mailing list
gnuspeech-contact@...
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact

Re: Thoughts on GNUSpeech and possible accessibility applications

by Marcelo Yassunori Matuda :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Jason,

On Mon, Apr 6, 2009 at 7:03 AM, Jason White <jason@...> wrote:
> Since the text to speech system doesn't run under GNU/Linux yet, I haven't
> been able to test it. However, the paper and sample files at
> http://pages.cpsc.ucalgary.ca/~hill/papers/avios95/menu.htm
> were very useful.

GnuSpeech runs on Linux, but you  need to install GNUstep. The source
is available at:
svn://svn.sv.gnu.org/gnuspeech/gnustep/trunk

I may create a source package and send it to you.
Which distribution do you use?

Regards,
Marcelo


_______________________________________________
gnuspeech-contact mailing list
gnuspeech-contact@...
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact

Re: Thoughts on GNUSpeech and possible accessibility applications

by Jason White-14 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Marcelo Yassunori Matuda <marcelo.matuda@...> wrote:
> GnuSpeech runs on Linux, but you  need to install GNUstep. The source
> is available at:
> svn://svn.sv.gnu.org/gnuspeech/gnustep/trunk

Thanks. I'll check that out.

Has the text to speech daemon been ported yet? The difficulty is that I
wouldn't be able to operate the interactive (graphical) environment using
either braille or speech, so what I really need is access from the shell.
>
> I may create a source package and send it to you.
> Which distribution do you use?

Debian Sid, with reasonably regular upgrades.

By way of background, I know C from having read K&R some years ago, but not
Objective-C. I don't have any background in complex analysis or digital signal
processing, but would like to study more mathematics at some point in the
future just for fun. Currently I am writing a Ph.D. thesis in contemporary
philosophy of language (i.e., foundational issues in analytic semantics). I
studied some phonetics as an undergraduate back in the early 90s, which may
help in these discussions, depending on how much I can remember.




_______________________________________________
gnuspeech-contact mailing list
gnuspeech-contact@...
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact

Re: Thoughts on GNUSpeech and possible accessibility applications

by Marcelo Yassunori Matuda :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Jason,

On Mon, Apr 6, 2009 at 8:58 PM, Jason White <jason@...> wrote:
> Has the text to speech daemon been ported yet? The difficulty is that I
> wouldn't be able to operate the interactive (graphical) environment using
> either braille or speech, so what I really need is access from the shell.

No, but there is a command line tool that synthesizes speech.

I have tested in Debian Lenny the following procedure:

(I recommend that you create another user, e.g. gnuspeech, and follow
the procedure inside /home/gnuspeech.)

Run as root:
apt-get install gnustep gnustep-devel
apt-get install portaudio19-dev
apt-get install libgdbm-dev

The remaining steps may be run as the user gnuspeech. The files will
be installed in ~/GNUstep and ~/Library.

mkdir temp
cd temp
svn co svn://svn.sv.gnu.org/gnuspeech/gnustep/trunk
cd trunk

Edit the file Frameworks/GnuSpeech/TextProcessing/GSDBMPronunciationDictionary.h
and replace:
#include <ndbm.h>
with:
#include <gdbm-ndbm.h>

. /usr/share/GNUstep/Makefiles/GNUstep.sh
./install.sh

If you receive an "Ok", the installation has been succesful.

cd Tools/GnuSpeechCLI/
./gnuspeechcli.sh hello world

Monet, the GUI editor, is not working in this Debian release, but the
command line util is ok.

p.s. Why are you running Sid and not "testing" or "stable"?

Regards,
Marcelo


_______________________________________________
gnuspeech-contact mailing list
gnuspeech-contact@...
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact

Parent Message unknown Re: Thoughts on GNUSpeech and possible accessibility applications

by Dalmazio Brisinda :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Marcelo Yassunori Matuda <marcelo.matuda@...> wrote:
>> GnuSpeech runs on Linux, but you  need to install GNUstep. The source
>> is available at:
>> svn://svn.sv.gnu.org/gnuspeech/gnustep/trunk
>
> Thanks. I'll check that out.
>
> Has the text to speech daemon been ported yet? The difficulty is  
> that I
> wouldn't be able to operate the interactive (graphical) environment  
> using
> either braille or speech, so what I really need is access from the  
> shell.

Jason,

The text to speech daemon has been partially ported to OS X. By  
"partially" I mean given text it generates synthesized output, but  
there is currently no intonation (something I intend to remedy  
shortly) or other more sophisticated features. It's also accessible  
via OS X services.

I'm not sure about the port to Linux though, as the text to speech  
daemon uses the OS X distributed objects architecture, and I don't  
know to what degree this same architecture is supported on the Gnustep/
Linux platform. Needless to say this elegant IPC/RPC architecture  
makes writing servers/dameons quite straightforward on OS X.

There was also an accessibility application called Touch N' Talk  
written for the Nextstep platform some years ago, that used a graphics  
tablet, and textured overlay that allowed visually impaired users to  
browse through text documents, with a visual representation of the  
tablet and activities occurring on the tablet. However, it seems that  
the last version of the software currently in the nextstep  
distribution tree was not the latest working version on the Nextstep  
platform. So this would probably need to be tracked down before a  
proper port could begin on either the gnustep or osx platforms.

(I have an old crashed internal hard drive which I think holds this  
information - latest Nextstep Touch 'N Talk -- but I've never had the  
opportunity to try data recovery of this information... and I'm not  
sure if it exists archived somewhere else -- I would imagine so.  
Possibly on one of the old Nextstep machines or old backup systems if  
they are still around. David, the project administrator, might know  
more about this).

Best,
dalmazio



_______________________________________________
gnuspeech-contact mailing list
gnuspeech-contact@...
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact

Re: Re: Thoughts on GNUSpeech and possible accessibility applications

by Jason White-14 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dalmazio Brisinda <dbrisinda@...> wrote:

> The text to speech daemon has been partially ported to OS X. By  
> "partially" I mean given text it generates synthesized output, but there
> is currently no intonation (something I intend to remedy shortly) or
> other more sophisticated features. It's also accessible via OS X
> services.

The ability to silence speech very quickly and start speaking a new message is
a particularly important additional feature that would be required for
accessibility applications.

Also desirable is the capacity to track which word is currently being spoken
in text already submitted by the application. Control over such matters as the
handling of punctuation characters (whether to announce them or simply process
them as influences upon the pausing and intonation) would need to be
controllable via the API, as would speech rate and any other tunable
parameters.

I mention these factors not as an attempt to influence development priorities
(which are of course entirely at the discretion of whoever is doing the work),
but as a synopsis of the kinds of API features that would be needed.
>
> I'm not sure about the port to Linux though, as the text to speech  
> daemon uses the OS X distributed objects architecture, and I don't know
> to what degree this same architecture is supported on the Gnustep/Linux
> platform. Needless to say this elegant IPC/RPC architecture makes writing
> servers/dameons quite straightforward on OS X.


http://www.gnustep.org/resources/documentation/Developer/Base/ProgrammingManual/manual_7.html

This appears promising.



_______________________________________________
gnuspeech-contact mailing list
gnuspeech-contact@...
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact

Re: Re: Thoughts on GNUSpeech and possible accessibility applications

by David Hill-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Jason,

On Apr 7, 2009, at 5:59 PM, Jason White wrote:

Dalmazio Brisinda <dbrisinda@...> wrote:

The text to speech daemon has been partially ported to OS X. By  
"partially" I mean given text it generates synthesized output, but there 
is currently no intonation (something I intend to remedy shortly) or 
other more sophisticated features. It's also accessible via OS X 
services.

The ability to silence speech very quickly and start speaking a new message is
a particularly important additional feature that would be required for
accessibility applications.

Also desirable is the capacity to track which word is currently being spoken
in text already submitted by the application. Control over such matters as the
handling of punctuation characters (whether to announce them or simply process
them as influences upon the pausing and intonation) would need to be
controllable via the API, as would speech rate and any other tunable
parameters.

You may be interested to check out the paper that describes the "Touch 'n Talk" system that Dalmazio mentioned in an earlier email.  The direct link is:


it is an item on my university web site to which you can also navigate.

I wish it were all up and working right now.   It was very disappointing to lose it, along with all the other stuff we had developed, when NeXT went belly up but getting it up again is one of the goals towards which we are headed.  It would meet (and hopefully exceed) the requirements you have started to outline.

[snip]

All good wishes.

david
----------
David Hill
--------
 The only function of economic forecasting is to make astrology look respectable. (J.K. Galbraith)
--------


_______________________________________________
gnuspeech-contact mailing list
gnuspeech-contact@...
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact

Re: Re: Thoughts on GNUSpeech and possible accessibility applications

by Jason White-14 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

David Hill <drh@...> wrote:

> You may be interested to check out the paper that describes the "Touch 'n
> Talk" system that Dalmazio mentioned in an earlier email.  The direct
> link is:
>
> http://pages.cpsc.ucalgary.ca/~hill/papers/ieee-touch-n-talk-1988.pdf
>
> it is an item on my university web site to which you can also navigate.

Thank you, David, for the reference. This is an interesting paper. It also
reminds me of a related solution, developed at approximately the same time, by
Jim Thatcher for IBM Screen Reader, in which a separate key pad was used for
reading and navigational functions. Although I never had an opportunity to use
it, I recall that one of the principal difficulties in the early versions was
said to be that the system wouldn't automatically read new text presented on
screen, or read text in response to cursor movement - the users had to switch
frequently between the qwerty keyboard and the screen reader's key pad while
interacting with the application software.

The research in which I, personally, find the most insight is that by T.V.
Raman, first in his AsteR software (Audio System for Technical Readings:
http://www.cs.cornell.edu/home/raman/) and then in Emacspeak
(http://emacspeak.sourceforge.net/ and for the latest source code,
http://emacspeak.googlecode.com/).

Emacspeak works best with synthesizers that allow changes to be made
dynamically to voice characteristics, for example the DECTalk, and it would be
interesting to know whether GNUSpeech might eventually support such audio
formatting techniques.

Further, in his latest work at Google on the accessibility of mobile
telephones, Raman has devised a means of making touch screen input achievable
in an "eyes-free" context.

I trust that this digression into speech interface research is not unwelcome
on the list; to ensure that it remains on topic, I have sought to connect it
to functional requirements of a text to speech system.

I think there is a need for a free (as in freedom) tts system capable of
supporting the products of past and current speech interface research, while,
just as importantly, providing opportunities for future research and free
software development efforts.

I also agree with David's observation that many of the most important
requirements are already treated in his 1988 paper, although advances such as
Raman's "audio formatting" techniques create additional, desirable features.



_______________________________________________
gnuspeech-contact mailing list
gnuspeech-contact@...
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact

Re: Thoughts on GNUSpeech and possible accessibility applications

by Jason White-14 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Marcelo Yassunori Matuda <marcelo.matuda@...> wrote:
 
> I have tested in Debian Lenny the following procedure:
[snip]

Thank you for documenting the compilation and installation procedure, which I
was able to follow successfully.
> p.s. Why are you running Sid and not "testing" or "stable"?

I am trying to keep up to date with Gnome and related packages so as to use,
and test, Orca. I have found Sid to be generally reliable, with occasional,
short-lived, problems, but perhaps I should really be using Testing instead.
I usually just downgrade the occasional troublesome package. If Sid starts
living up to its name more often, I'll switch to Testing.



_______________________________________________
gnuspeech-contact mailing list
gnuspeech-contact@...
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact

Re: Re: Thoughts on GNUSpeech and possible accessibility applications

by David Hill-14 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Jason,

On Apr 7, 2009, at 9:06 PM, Jason White wrote:

> David Hill <drh@...> wrote:
>
>> You may be interested to check out the paper that describes the  
>> "Touch 'n
>> Talk" system that Dalmazio mentioned in an earlier email.  The direct
>> link is:
>>
>> http://pages.cpsc.ucalgary.ca/~hill/papers/ieee-touch-n-talk-1988.pdf
>>
>> it is an item on my university web site to which you can also  
>> navigate.
>
> Thank you, David, for the reference. This is an interesting paper.  
> It also
> reminds me of a related solution, developed at approximately the  
> same time, by
> Jim Thatcher for IBM Screen Reader, in which a separate key pad was  
> used for
> reading and navigational functions. Although I never had an  
> opportunity to use
> it, I recall that one of the principal difficulties in the early  
> versions was
> said to be that the system wouldn't automatically read new text  
> presented on
> screen, or read text in response to cursor movement - the users had  
> to switch
> frequently between the qwerty keyboard and the screen reader's key  
> pad while
> interacting with the application software.

This was not a limitation with Touch-'n-Talk which was designed to  
integrate control and access within a single haptic-auditory  
interface in as natural a way as possible.  We made a direct  
comparison between our system and a conventional key-operated talking  
terminal and our target population of blind users preferred the  
"Touch-'n-Talk" system, and the results for all users (there were  
five blind subjects and twelve sighted but blindfolded subjects) were  
comparable, which was a useful result since it implies that tests  
using blindfolded subjects can be used, at least in exploratory  
evaluations.

>
> The research in which I, personally, find the most insight is that  
> by T.V.
> Raman, first in his AsteR software (Audio System for Technical  
> Readings:
> http://www.cs.cornell.edu/home/raman/) and then in Emacspeak
> (http://emacspeak.sourceforge.net/ and for the latest source code,
> http://emacspeak.googlecode.com/).

I checked out some of Dr. Raman's examples -- apparently using  
DECTalk.  Audio formatting has the advantage that no special  
equipment is required.

But imagine if you could access those mathematical formulae by using  
your finger, and being able to feel the spatial relationships between  
the components with your hand and fingers, checking individual  
characters and manipulating a mark as well as a cursor.  "Touch-'n-
Talk was deliberately designed to make spatial cues available with  
the need for sight, and to allow bookmarks, words, paragraphs, pages,  
spelling, searching and so on to be handled easily.

>
> Emacspeak works best with synthesizers that allow changes to be made
> dynamically to voice characteristics, for example the DECTalk, and  
> it would be
> interesting to know whether GNUSpeech might eventually support such  
> audio
> formatting techniques.

Having an articulatory synthesiser means that many different voices  
can be created dynamically, from child voices through female, to  
male.  Having said that, not all voice characteristics are well  
understood, and not just for excellent female voices.

>
> Further, in his latest work at Google on the accessibility of mobile
> telephones, Raman has devised a means of making touch screen input  
> achievable
> in an "eyes-free" context.

Being not just "eyes-free" but providing equivalent facilities using  
touch and sound were basic design criteria for "Touch-'n-Talk" as you  
obviously realise from reading the paper.


>
> I trust that this digression into speech interface research is not  
> unwelcome
> on the list; to ensure that it remains on topic, I have sought to  
> connect it
> to functional requirements of a text to speech system.
>
> I think there is a need for a free (as in freedom) tts system  
> capable of
> supporting the products of past and current speech interface  
> research, while,
> just as importantly, providing opportunities for future research  
> and free
> software development efforts.
>
> I also agree with David's observation that many of the most important
> requirements are already treated in his 1988 paper, although  
> advances such as
> Raman's "audio formatting" techniques create additional, desirable  
> features.
>
>

I am not sure what was missing compared to Raman's approach.  If you  
have spatial references through touch, the changing pitch is more of  
a distraction that a help, IMHO.

Warm regards.

david

-------------



_______________________________________________
gnuspeech-contact mailing list
gnuspeech-contact@...
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact