You are here

STTR Phase I: Small Footprint Speech Synthesis

Award Information
Agency: National Science Foundation
Branch: N/A
Contract: 041125
Agency Tracking Number: 0441125
Amount: $99,751.00
Phase: Phase I
Program: STTR
Solicitation Topic Code: IT
Solicitation Number: NSF 04-551
Timeline
Solicitation Year: 2004
Award Year: 2005
Award Start Date (Proposal Award Date): N/A
Award End Date (Contract End Date): N/A
Small Business Information
940 Upper Devon Lane
Lake Oswego, OR 97034
United States
DUNS: N/A
HUBZone Owned: No
Woman Owned: No
Socially and Economically Disadvantaged: No
Principal Investigator
 Alexander Kain
 Dr.
 (503) 329-9604
 kain@biospeech.com
Business Contact
 Lois Black
Phone: (503) 534-2891
Email: lmblack@cslu.ogi.edu
Research Institution
 Oregon Health & Science University
 Jan P van Santen
 
20000 N.W. Walker Road
Beaverton, OR 97006
United States

 (503) 748-1138
 Nonprofit College or University
Abstract

This Small Business Technology Transfer Phase I project aims to develop and implement a new algorithm in the area of text-to-speech synthesis (TTS) that will lead to (i) dramatic decreases in disk and memory requirements at a given speech quality level and (ii) minimization of the amount of voice recordings needed to create a new synthetic voice. Most current TTS systems operate by concatenating segments of recorded speech ([acoustic] units). A challenge for TTS is coarticulation: The dependency of the acoustic manifestations of a phoneme on its neighbors. Current TTS systems use multi-phone acoustic units such as diphones, which preserve coarticulatory patterns naturally present in speech. However, this approach requires a large amount of recordings and generates systems with large footprints. Biospeech proposes a uniphone approach that addresses coarticulation processes with an explicit model. The method uses complex spectral vectors (basis vectors) representing brief segments of speech inside single phonemes, and decomposes these into two components: A formant vector and a spectral balance vector. To generate speech, the formant and spectral balance vectors derived from the basis vectors corresponding to successive phonemes are subjected to separate--and hence generally asynchronous--interpolation operations using time varying weights; the formant and spectral balance vector trajectories thus created are re-combined to create a trajectory in complex spectral space; finally, this trajectory is converted into output speech with the inverse Fourier transform. Asynchronicity is necessitated by the quasi-independence of articulators underlying different spectral features (e.g., frication, formant frequencies). The proposed work has implications for other speech technologies, including Automatic Speech Recognition (ASR). Current ASR technologies address coarticulation by using multi-phone units, typical triphones. The number of triphones in English is over 70,000, and thus requires a large amount of training recordings. The proposed model could dramatically impact on the amount of recordings required for system training. Second, TTS has generally recognized societal benefits for universal access, education, and information access by voice. For example, TTS-based augmentative devices are available for individuals who have lost their voice; and reading machines for the blind have been available for several decades. Third, the approach will make higher-quality TTS more available for smaller devices. For example, voice based caller ID on low-end mobile telephones is currently not possible due to memory limitations. Fourth, it enables voice adaptation with a minimum of recordings. This will enable building personalized TTS systems for individuals with speech disorders who can only intermittently produce normal speech sounds or for individuals who are about to undergo surgery that will irreversibly alter their speech. The method proffered by Biospeech only requires recordings of valid samples of each of (less than 50) phonemes instead of each of (2000 or more) diphones.

* Information listed above is at the time of submission. *

US Flag An Official Website of the United States Government