Home

Detecting subject boundaries within text: A language independent statistical approach


Author(s) : Korin Richmond, 
Publisher : N/A
Publication Date : 1997
ISSN : N/A
Abstract : We describe here an algorithm for detect-ing subject boundaries within text based on a statistical lexical similarity measure. Hearst has already tackled this problem with good results (Hearst, 1994). One of her main assumptions is that a change in subject is accompanied by a change in vo-cabulary. Using this assumption, but by introducing a new measure of word signif-icance, we have been able to build a ro-bust and reliable algorithm which exhibits improved accuracy without sacrificing lan-guage independency. 1,