|
Abstract : |
A simple method is presented which, loosely speaking, virtually removes noise or misfit from data, and thereby converts a "noise-free " algorithm A, which on-line learns linear functions from data without noise or misfit, into a "noise-tolerant " algorithm A nt which learns linear functions from data containing noise or misfit. Given some technical conditions, this conversion preserves optimality. For instance, the optimal noise-free algorithm B of Bernstein from [3] is converted into an optimal noisetolerant algorithm B nt. The conversion also works properly for all function classes which are closed under addition and contain linear functions as a subclass. In the second part of the paper, we show that Bernstein's on-line learning algorithm B can be converted into a batch learning algorithm B which consumes an (almost) minimal number of random training examples. This is true for a whole class of "pac-style " batch learning models (including learning with an (ffl; fl)-good model and learning with an ffl-bounded expected r-loss). The upper bound on the sample size is shown by applying a method of Littlestone from [9] which converts an online learning algorithm A into a batch learning algorithm A and relates the sample size of A to the total loss of A. For this purpose, Littlestone's method is extended from, |