|
Abstract : |
Non-demonstrative or inductive reasoning is a crucial component in the skills of a learner. A leading candidate for this form of reasoning involves the automatic formation of hypotheses. Initial successes in the construction of propositional theories have now been followed by algorithms that attempt to generalise sentences in the predicate calculus. An important defect in these new-generation systems is the lack of a clear model for theory justification. In this paper we describe a method of evaluating the significance of a hypothesis based on the degree to which it allows compression of the observed data with respect to prior knowledge. This can be measured by comparing the lengths of the input and output tapes of a reference Turing machine which will generate the examples from the hypothesis and a set of derivational proofs. The model extends an earlier approach of Muggleton by allowing for noise. The truth values of noisy instances are switched by making use of correction codes. The utility of compression as a significance measure is evaluated empirically in three independent domains. In particular, the results show that the existence of compression distinguishes a larger number of significant clauses than other significance tests. The method also appears to distinguish noise as incompressible data. 1, |