Home

Clustering for approximate similarity search in high-dimensional spaces


Author(s) : Hector Garcia-molina Edward Y. Chang Chen Li Gio Wiederhold, 
Publisher : N/A
Publication Date : 2002
ISSN : N/A
Abstract : In this paper we present a clustering and indexing paradigm (called Clindex) for high-dimensional search spaces. The scheme is designed for approximate similarity searches, where one wishes to find many of the data points near a target point, but where one can tolerate missing a few near points. For such searches, our scheme can find near points with high recall in very few IOs and perform significantly better than other approaches. Our scheme is based on finding clusters, and then building a simple but efficient index for them. We analyze the tradeoffs involved in clustering and building such an index structure, and present experimental results and a Web-based image-database prototype that we have built.,