|
Abstract : |
In spoken dialogue systems, it is important for a system to know how likely a speech recognition hy-pothesis is to be correct, so it can reprompt for fresh input, or, in cases where many errors have occurred, change its interaction strategy or switch the caller to a human attendant. We have discov-ered prosodic features which more accurately predict when a recognition hypothesis contains a word error than the acoustic confidence score thresholds tradi-tionally used in automatic speech recognition. We present analytic results indicating that there are sig-nificant prosodic differences between correctly and incorrectly recognized turns in the TOOT train in-formation corpus. We then present machine learn-ing results showing how the use of prosodic features to automatically predict correct versus incorrectly recognized turns improves over the use of acoustic confidence scores alone. 1, |