Home

Tagging english text with a probabilistic model


Author(s) : Bernard Merialdo, 
Publisher : N/A
Publication Date : 1994
ISSN : N/A
Abstract : In this paper we present some experiments on the use of a probabilistic model to tag English text, i.e. to assign to eachword the correct tag (part of speech) in the context of the sentence. The main novelty of these experiments is the use of untagged text in the training of the model. We have used a simple triclass Markov model and are looking for the best way to estimate the parameters of this model, depending on the kind and amount of training data provided. Two approaches in particular are compared and combined: to use text that has been tagged by hand and compute relative frequency counts, to use text without tags and train the model as a hidden Markov process, according to a Maximum Likelihood principle. Experiments show that the best training is obtained by using as much tagged text as possible. They also show that Maximum Likelihood training, the procedure which is routinely used to estimate Hidden Markov Models parameters from training data, will not necessarily improve the tagging accuracy. In fact, it will generally degrade this accuracy, except when only a limited amount of hand tagged text is available. 1.,