Home

A robust unit selection system for speech synthesis


Author(s) : Alistair Conkie, 
Publisher : N/A
Publication Date : 1999
ISSN : N/A
Abstract : There has been much interest for many years in diphone-based concatenative speech synthesis and, recently, a rapidly increasing interest in unit selection based synthesis (as illustrated by the CHATR [2] system). However, the limitations of both types of system are well known. While intelligibility is generally very high for diphone based systems, the resulting signals do not sound completely natural. This happens for several reasons, amongst them the limited number of phone variants present in a typical system, and the potential artifacts introduced by concatenating at diphone boundaries. For unit selection synthesis, typically phone-based, it is possible to produce sentences that sound surprisingly natural and intelligible from a large database. However, quality is often inconsistent, and the main difficulties appear to be selecting acoustically appropriate units with the correct prosodic characteristics. Also, note that typically no prosody modification is done to achieve the highest possible quality. In an effort to capture the best features of both systems we have devised a unit-selection and synthesis algorithm that allows finer control than the CHATR system (version 0.8), both by applying selective prosody modification and by exercising finer control over the units that get chosen for synthesis. We present the algorithm and results of experiments based on our own version of unit selection synthesis. 1.,