|
Abstract : |
Many existing learning algorithms assume that the entire data set fits into main memory, which is not feasible for massive amounts of inherently distributed data. One approach we explore to handling a large data set is to partition the data into disjoint subsets, run the learning algorithm on each of these subsets of data, and combine the results by some means. The results achieved to date show promising results, but in nearly all cases accuracy of the final combined classifier is not as great as a single classifier computed from the entire data set. In this paper we evaluate our approach, called meta-learning, to learning from partitioned data where we relax the restriction that each subset of training data is disjoint, i.e. some amount of replication of training data is allowed. We anticipated data replication could improve overall accuracy, however, our findings suggest the contrary. 1, |