Home

Database selection using actual physical and acquired logical collection resources in a massive domain-specific operational environment


Author(s) : Mortera Meziou Peter Jackson Xi Guo Jack G. Conrad Jack Conrad, 
Publisher : N/A
Publication Date : 2002
ISSN : N/A
Abstract : The continued growth of very large data environments such as Westlaw, Dialog, and the World Wide Web, increases the importance of effective and efiqcient database selection and searching. Recent research has focused on autonomous and automatic collection selection, searching, and results merging in distributed environments. These studies often rely on TREC data and queries for experimentation. We have extended this work to West's on-line production environment where thousands of legal, financial and news databases are accessed by up to a quarter-million professional users each day. Using the WIN natural language search engine, a cousin to UMass's INQUERY, along with a collection retrieval in-ference network (CORI) to provide database scoring, we examine the effect that a set of optimized parameters has on database selection performance. We also compare current language modeling techniques to this approach. Traditionally, West's information has been structured over 15,000 online databases, representing roughly 6 terabytes of textual data. Given the expense of running global searches in this environment, it is usually not practical to perform full document retrieval over the entire collection. It is therefore necessary to create a new infrastructure to support automatic database selection in the service of broader searching. In this research, we represent our operational environment in two distinct ways.,