Home

UMass at TREC 2002: Cross language and novelty tracks


Author(s) : Courtney Wade Alvaro Bolivar Margaret E. Connell James Allan Leah S. Larkey, 
Publisher : N/A
Publication Date : 2002
ISSN : N/A
Abstract : {larkey I allan I cormell I alvarob I cwade}cs.umass.edu The University of Massachusetts participated in the cross-language and novelty tracks this year. The cross-language submission was characterized by combination of evidence to merge results from two different retrieval engines and a variety of different resources stemmers, dictionaries, machine translation, and an acronym database. We found that proper names were extremely important in this year's queries. For the novelty track, we applied variants of techniques that have been employed for other problems. In addition, we created additional training data by manually annotating 48 additional topics. 1. Cross Language Track We submitted one monolingual run and four cross-language runs. For the monolingual run, the technology was essentially the same as the system we used for TREC 2001. For the cross-language ran, we integrated some new elements into the Arabic system- a light stemmer that was the result of extensive research [ 10], the standard probabilistic dictionary based on the UN bilingual lexicon, an expanded dictionary, acronym expansion [12], language modeling, and relevance modeling. In addition, we utilized combination of evidence extensively. Because our submitted runs were the results of combination of evidence (combining ranked lists from multiple IR runs) we use the term sub-run to refer the individual component runs before combination. We first describe the,