|
Abstract : |
In this paper we introduce MULTIVOC, a real-world text-to-speech product geared to the French language. Starting from a ordinary French text, MUL-TIVOC generates in real-time a high quality speech using a synthesis-by-diphone method. The process-ing is divided into 3 main transformations (phoneti-zation, automatic prosody and rhythm marking, and generation of LPC frames). This paper provides a full description of MULTIVOC including not only the technical view but also some applications of the product within the real world. 1. PRESENTATION OF MULTIVOC The text-to-speech MULTIVOC system is the result of a technology transfer from a research insti-tute (CNET Lannion, France), which developed the basis of the system, to an industrial company (Cap Sogeti Innovation, France) which made the system a commercial product. Generating Linear Prediction Coding frames from ordinary text written in French, the goal of MULTIVOC is to give any standard applications the ability to produce (in real time) low-cost and high-quality speech output. MULTIVOC is shipped as a complete software system which aims to provide a sophisti-cated driver enabling applications to directly send French spoken text. The software package consists of the kernel of the driver itself and a set of dic-tionaries used by it. Several tools in the package allow an advanced user to tailor his own MUL-TIVOC driver to specific usage. Beside this static configuration facility, MULTIVOC also provides several run-time features. By submitting specific requests an application can change the following parameters: ? The sampling frequency for generated frames. Three different frequencies are available: 8 kHz, 10 kHk and 16 kHz. This parameter will characterize the quality of the output voice, a frequency of 16 kHz providing the best results., |