Home

Data selection for support vector machine classification


Author(s) : Olvi L. Mangasarian Glenn Fung, 
Publisher : N/A
Publication Date : 2000
ISSN : N/A
Abstract : The problem of extracting a minimal number of data points from a large dataset, in order to generate a support vector machine (SVM) classifier, is formulated as a concave min-imization problem and solved by a finite number of linear programs. This minimal set of data points, which is the smallest number of support vectors that completely characterize a separating plane classifier, is considerably smaller than that required by a standard 1-norm support vector machine with or without feature selection. The proposed approach also incorporates a feature selection procedure that results in a minimal number of input features used by the classifier. Tenfold cross validation gives as good or better test results using the proposed minimal support vector ma-chine (MSVM) classifier based on the smaller set of data points compared to a standard 1-norm support vector machine classifier. The reduction in data points used by an MSVM classifier over those used by a 1-norm SVM classifier averaged 66 % on seven public datasets and was as high as 81%. This makes MSVM a useful incremental classification tool which maintains only a small fraction of a large dataset before merging and processing it with new incoming data.,