Home

Parallel and distributed search for structure in multivariate time series


Author(s) : Paul R. Cohen Matthew D. Schmill Tim Oates, 
Publisher : N/A
Publication Date : 1997
ISSN : N/A
Abstract : E cient data mining algorithms are crucial for e ective knowledge discovery. We present the Multi-Stream Dependency Detection (msdd) data mining algorithm that performs a systematic search for structure in multivariate time series of categorical data. The systematicity ofmsdd's search makes implementation of both parallel and distributed versions straightforward. Distributing the search for structure over multiple processors or networked machines makes mining of large numbers of databases or very large databases feasible. We present results showing that msdd e ciently nds complex structure in multivariate time series, and that the distributed version nds the same structure in approximately 1=n of the time required by msdd, wheren is the number of machines across which the search is distributed. msdd di ers from other data mining algorithms in the complexity of the structure that it can nd. msdd also requires no domain knowledge to focus or limit its search, although such knowledge is easily incorporated when it is available.,