Home

Language modeling with limited domain data


Author(s) : Alexander I. Rudnicky, 
Publisher : N/A
Publication Date : 1995
ISSN : N/A
Abstract : Generic recognition systems contain language models which are representative of a broad corpus. In actual practice, however, recognition is usually on a coherent text covering a single topic, suggesting that knowledge of the topic at hand can be used to advantage. A base model can be augmented with information from a small sample of domain-specific language data to significantly improve recognition performance. Good performance may be obtained by merging in only those n-grams that include words that are out of vocabulary with respect to the base model.,