|
Abstract : |
We describe here an algorithm for detect-ing subject boundaries within text based on a statistical lexical similarity measure. Hearst has already tackled this problem with good results (Hearst, 1994). One of her main assumptions is that a change in subject is accompanied by a change in vo-cabulary. Using this assumption, but by introducing a new measure of word signif-icance, we have been able to build a ro-bust and reliable algorithm which exhibits improved accuracy without sacrificing lan-guage independency. 1, |