On Nov 10, 2008, at 9:13 PM, Dalmazio Brisinda wrote:
Very cool! Much of this sounds quite similar to what we were talking about over a month ago re: computation of the resonant function based on a smoothly changing radial interpolation function depending on where in the tube we were -- but especially at boundaries. In this case, they use MRI data for this function.
As I mentioned before, the snag is that more sections would be needed, the sample rate would increase, and computation speed would likely be an issue again.
Just had a slightly playful thought, I wonder if there is MRI data for samples limited to aesthetically pleasing male and female voices (separate). I'm sure there would be some physiological differences between taking the average of MRI data over a large 'random' sample vs. limiting to just 'attractive' samples.
Voice quality has more to do with the glottal excitation function (including intonation) than vocal tract shape, though some vocal tract effects are pleasing -- like *clarity* of articulation, on which we still don't have a good handle (some speakers seem to adjust their articulation to maximise the clarity by adjusting the formants for best effect, but not in a voluntary way. I got that from Walter Lawrence himself).
So, I'm curious, what were the subjective results like? I would suspect much smoother sounding synthesis, and therefore greater intelligibility.
Good topic for a PhD thesis :-) Intelligibility is not synonymous with better quality synthesis. DECTalk (MITalk) is pretty intelligible but very tiring to listen to for long periods, which is probably due in large part to the unnatural rhythm and intonation.
That shifting of the zero-crossings/DRM boundaries towards the lips is also interesting. 60/40 weighting for the length of the back half vs. the length of the front half. Is anyone looking into incorporating these two changes into gnuspeech? The 60/40 weighting change would probably not be too difficult. The change involving using MRI data to create a non-uniform radial function sounds a little more involved though, but very interesting!
The real point is that the "rest" state of the tube is non-uniform, but it produces similar formants to a uniform tube. This means the boundaries of the tube DRM regions are shifted from the original theory and the radii have to be different in the rest state. This almost certainly means, again, more sections are needed. It would not be that easy but needs to be looked at.
Warm regards.