Early GNUSpeech observations

View: New views
5 Messages — Rating Filter:   Alert me  

Early GNUSpeech observations

by Jason White-14 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Having run the (very early) GNU/Linux version, I wish to congratulate the
authors of GNUSpeech for having advanced the porting effort this far.

I notice that the version which I built doesn't pronounce names and relatively
uncommon words - perhaps it is restricted to pronouncing words that are in its
dictionary. I hear a "zzz" sound in place of each omitted word.

Have the letter to sound rules not been ported yet, or is it just a bug? I
think it is important for any synthesizer to have good letter to sound rules,
since there will inevitably be words in the text that aren't in the
dictionary.

I also find the intonation pattern interesting, and quite different from the
original samples, but I'm sure that improving it is on the list of tasks to be
completed. It also seems to me that the tonal quality of the voice is better
than that of the sound files that David generously supplied on his Web site,
but this might be entirely my imagination.

Thank you, again, for the excellent work so far.



_______________________________________________
gnuspeech-contact mailing list
gnuspeech-contact@...
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact

Re: Early GNUSpeech observations

by David Hill-14 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Jason,

On Apr 8, 2009, at 1:24 AM, Jason White wrote:

> Having run the (very early) GNU/Linux version, I wish to  
> congratulate the
> authors of GNUSpeech for having advanced the porting effort this far.

Thank you.

>
> I notice that the version which I built doesn't pronounce names and  
> relatively
> uncommon words - perhaps it is restricted to pronouncing words that  
> are in its
> dictionary. I hear a "zzz" sound in place of each omitted word.

This sound was put in there deliberately by Steve Nygard to make sure  
it was clearly understood that the system was not dealing with parts  
of the input, because (as you guess) the parser (which does all kinds  
of things including dictionary derivatives, arranging numbers and  
dates to be spoken in the way people speak them, and so on) is by no  
means completely ported.  It is probably the very next job because it  
makes a big different to the overall quality of the spoken output.

There is a letter-to-sound component in there as should be  
functioning.  I don't think that's what causes the funny "zzzzzt"  
noises, though they are not very good rules and normally are not  
normally used much because with a 70,000 word dictionary, with hand-
crafted pronunciations, and facilities for a lot of derivative words,  
the letter-to-sound rules are rarely called in the complete system.  
They are based on work by McIroy at Bell Labs.

>
> Have the letter to sound rules not been ported yet, or is it just a  
> bug? I
> think it is important for any synthesizer to have good letter to  
> sound rules,
> since there will inevitably be words in the text that aren't in the
> dictionary.

Also, the dictionary should be expanded -- a project that got put on  
hold when the NeXT & NeXT software disappeared.  All sorts of proper  
names/nouns need to be added, including city and country names,  
people's names, and so on.  It has been more important recently to  
get the basic software up on GNU/Linux and the Mac.

>
> I also find the intonation pattern interesting, and quite different  
> from the
> original samples, but I'm sure that improving it is on the list of  
> tasks to be
> completed. It also seems to me that the tonal quality of the voice  
> is better
> than that of the sound files that David generously supplied on his  
> Web site,
> but this might be entirely my imagination.

Again, the intonation rules, based on the M.A.K. Halliday's  
intonation scheme for British English, were being refined.  Craig  
[Taube-] Schock wrote his thesis on the topic under my supervision  
("Intonation for Computer Speech Output" -- University of Calgary  
Dept. of Computer Science 1993) and received the Governor General's  
Gold Medal for it, but the method had already been greatly improved  
when we released the new articulatory synthesis software in 1994-5.

The "Lumberjack" and "The Chaos" speech samples on my university web  
site under "Gnuspeech material" are the untouched results of putting  
punctuated text into the original NeXTStep version of Gnuspeech  
(known then as the Trillium TextToSpeech kit).  The "Pat-a-pan"  
sample was a Christmas teaser composed by our PhD musician Leonard  
Manzara for Christmas 1994.  There are no instruments in the piece  
which is a simulation of singing an old Burgundian carol in four  
parts, with 16 voices, and set in an auditorium 30 feet square, with  
reverberations supplied by Leonard's acoustic imaging software (part  
of his PhD work).  I attach a short write up on that piece for  
convenience.

Hope this helps.


>
> Thank you, again, for the excellent work so far.
>
>

You encouragement is much appreciated.

Warm regards.

david

---------

Pat-a-pan (only the first verse of this old Burgundian carol is  
synthesised)

Note that there is no instrumental accompaniment in this synthesis,  
only voice harmony.

[The sound files are on my university web site: http://
pages.cpsc.ucalgary.ca/~hill under "Gnuspeech material"]

God and man this day are one,
Even more than fife and drum;
So these instruments we play,
Tu-re-lu-re-lu, pat-a-pat-a-pan,
So these instruments we play
For a joyful Christmas day!

This synthesis was produced as a pre-Christmas teaser for advertising  
puposes for Trillium Sound Research Inc in 1994. There are 16  
unaccompanied male voices in four parts—arranged by Leonard Manzara—
and located in a virtual hall 20 metres by 30 metres using acoustic  
imaging software developed by Leonard for the technical part of his  
doctoral thesis in music from the SUNY at Buffalo (Manzara 1990).  
Because it is a carol, the rhythm and intonation for the four parts  
are musically determined and not composed by the rhythm and  
intonation rules used for the other examples. Some variation was  
introduced between voices singing the same parts. Only the sopranos  
sing the lyrics above, the other parts sing “pat-a-pan” in various  
ways. The composition was completed before the system was finalised,  
so there are some deficiencies, notably in the balance between voiced  
and unvoiced sound. The sixteen different voices and acoustic imaging  
required significant effort which has not been repeated since the  
system achieved release status.
References

Manzara LC (1990) The simulation of acoustical space by means of  
physical modeling. PhD Dissertation, Faculty of the Graduate School  
of the State University of New York at Buffalo

_______________________________________________
gnuspeech-contact mailing list
gnuspeech-contact@...
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact

Re: Early GNUSpeech observations

by Jason White-14 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

David Hill <hilld@...> wrote:

> This sound was put in there deliberately by Steve Nygard to make sure it
> was clearly understood that the system was not dealing with parts of the
> input, because (as you guess) the parser (which does all kinds of things
> including dictionary derivatives, arranging numbers and dates to be
> spoken in the way people speak them, and so on) is by no means completely
> ported.  It is probably the very next job because it makes a big
> different to the overall quality of the spoken output.

Thank you for the explanation, which is much appreciated. Now I understand why
plurals, some past tenses, and certain names are omitted from the
output. I now observe that many of the function declarations in
parser_module.m have been commented out, indicating that porting is still in
progress.

> Also, the dictionary should be expanded -- a project that got put on  
> hold when the NeXT & NeXT software disappeared.  All sorts of proper  
> names/nouns need to be added, including city and country names, people's
> names, and so on.  It has been more important recently to get the basic
> software up on GNU/Linux and the Mac.

Indeed so. Bringing the dictionary into accord with the phonetics of the
synthesizer would improve the spoken output markedly, I think, as exemplified
by those pronounced "r" and "l" sounds that need to be addressed.

> Again, the intonation rules, based on the M.A.K. Halliday's intonation
> scheme for British English, were being refined.  Craig [Taube-] Schock
> wrote his thesis on the topic under my supervision ("Intonation for
> Computer Speech Output" -- University of Calgary Dept. of Computer
> Science 1993) and received the Governor General's Gold Medal for it, but
> the method had already been greatly improved when we released the new
> articulatory synthesis software in 1994-5.

In the current, partially ported, version the intonation tends to rise sharply
until punctuation is encountered. I have noticed, however, that the parsing
code inserts markers into the phonetic string which is forwarded to the rest
of the synthesizer for processing, and I surmise that these affect, among
other parameters, the intonation. Thus I will listen again when the parser has
been ported and refined.

With thanks and regards,

Jason.



_______________________________________________
gnuspeech-contact mailing list
gnuspeech-contact@...
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact

Re: Early GNUSpeech observations

by David Hill-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Jason,

On Apr 9, 2009, at 7:37 PM, Jason White wrote:

In the current, partially ported, version the intonation tends to rise sharply

until punctuation is encountered. I have noticed, however, that the parsing

code inserts markers into the phonetic string which is forwarded to the rest

of the synthesizer for processing, and I surmise that these affect, among

other parameters, the intonation. Thus I will listen again when the parser has

been ported and refined.


A quick comment on this observation of yours.  In Monet, there is an "Intonation Parameters" tool, which allows you to change the parameters of the intonation.  If you reduce the "Pretonic Range" to (say) 3, from 5, the speech doesn't sound so frenetic.  You can play around with all the parameters which alter ranges, notional pitch and such for the various components of the Halliday intonation framework for Spoken English.

The latest version of Monet has two input windows.  One is for ordinary text, and the other shows the resulting Monet "phonetic" input, with various added information for the intonation if you select "Parse Text".  When you "Synthesize", it is the "phonetic" input that gets spoken.

We're working on various aspects of the whole gnuspeech suite as you know.  I have just made a 0.7 version of "Synthesizer" available and Dalmazio has put it into the SVN repository (thanks Dalmazio).  "Synthesizer", you may remember, is an application to allow a user to experiment with the raw tube model that provides the vocal tract emulation.  It is also a tool that is needed when creating the synthesis databases for a new language using Monet.

All good wishes.

david
----------
David Hill
--------
 The only function of economic forecasting is to make astrology look respectable. (J.K. Galbraith)
--------


_______________________________________________
gnuspeech-contact mailing list
gnuspeech-contact@...
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact

Re: Early GNUSpeech observations

by Marcelo Yassunori Matuda :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Jason,

On Tue, Apr 21, 2009 at 9:46 PM, David Hill <drh@...> wrote:
> In Monet, there is an
> "Intonation Parameters" tool, which allows you to change the parameters of
> the intonation.  If you reduce the "Pretonic Range" to (say) 3, from 5, the
> speech doesn't sound so frenetic.  You can play around with all the
> parameters which alter ranges, notional pitch and such for the various
> components of the Halliday intonation framework for Spoken English.

In GnuSpeechCLI, there is a XML file named config.plist. You may
adjust the parameters, for example:

IntonationNotionalPitch
IntonationPretonicRange
IntonationPretonicLift
IntonationTonicRange
IntonationTonicMovement

Regards,
Marcelo


_______________________________________________
gnuspeech-contact mailing list
gnuspeech-contact@...
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact