|
Abstract : |
In the past few years, the Large Vocabulary Conversational Speech Recognition (LVCSR) community has attempted to address the problem of speech recognition on languages other than English. Work on the CallHome Corpora has verified that current technology is largely language independent, and that the dominant factor with regards to performance on a certain language is the amount of training data available ([1]). This brings forth the question of what is the appropriate course of action when we need to quickly bring a recognizer up in a new language, were little or no training is available. This is exactly the question we will address in this paper. We will assume that, while only a couple of hours of transcribed data is available, much more untranscribed data can be found, and we will explore ways to utilize it., |