|
Abstract : |
The author is very grateful to Wilfredo Leiva-Maldonado for helpful conversations, suggestions and theoretical support. Comments from other members of the Department were also very useful. This work was partially supported by the Spanish DGES grant PB96-0300. 1 Principal curves have been de ned (Hastie and Stuetzle 1989) as smooth curves passing through the middle of a multidimensional data set. They are nonlinear generalizations of the rst principal component, a charazterization of which is the basis for the principal curves de nition. In this paper we propose an alternative approach based on a di erent property of principal components. Consider a point in the space where a multivariate normal is de ned and, for each hyperplane containing that point, compute the total variance of the normal distribution conditioned to belong to that hyperplane. Choose now the hyperplane minimizing this conditional total variance and look for the corresponding conditional mean. The rst principal component of the original distribution passes by this conditional mean and it is orthogonal to that hyperplane. This property is easily generalized to data sets with nonlinear structure. Repeating the search from di erent starting points, many points analogous to conditional means are found. We call them principal oriented points. When a one-dimensional curve runs the set of these special points it is called principal curve of oriented points. Successive principal curves are recursively de ned from a generalization of the total variance., |