Home

Data-mining massive time series astronomical data sets - a case study


Author(s) : Markus Hegland Zhexue Huang Michael K. Ng, 
Publisher : N/A
Publication Date : 1998
ISSN : N/A
Abstract : In this paper we present a new application of data mining techniques to massive astronomical data sets. Previous data mining applications deal with time-independent multiple spectral astronomical data [2]. We are concerned with time series astronomical data. More precisely, our data consists of N time series, each with a duration of M days. The data set can be viewed as an N M matrix in which rows are di erent time series and the ordered columns correspond to consecutive time points when measurements were made. Our primary objective of mining such data is to classify the time series according to their morphology and to identify classes which may carry some special signature of morphology. The real data in this application consists of 40 million time series, each representing a sequence of brightness (light curve) of a star measured in one of two spectral bands on a daily basis in the MACHO project [1]. Totally, about 20 million stars (i.e., N 2 10 7) have been measured in the past four years. More than half a terabyte of time series data has resulted. The scienti c discovery tasks in our data mining exercise are to discover new variable stars and to,