Home

Efficient translation of external input in a dynamically typed language


Author(s) : Robert Paige, 
Publisher : N/A
Publication Date : 1994
ISSN : N/A
Abstract : New algorithms are given to compile external data in string form into data structures for high level datatypes. Let I be a language of external constants formed from atomic constants and from set, multiset, and tuple constructors. We show how to read an input string C, decide whether it belongs to I, convert it to internal form, and build initial data structures storing the internal value of C in linear worst case time with respect to the number of symbols in C. The algorithm does not require hashing or address arithmetic, but relies only on list processing. A principal subproblem is to detect and remove duplicate elements from set-valued input. To solve this subproblem we extend the technique of multiset discrimination [2, 5] to detect all duplicate elements of a multiset, where these elements may themselves be tuples, multisets, or sets with arbitrary degree of nesting. To handle the case where the elements are multisets, we introduce a new technique called weak sorting, which sorts all of these multisets uniformly according to an arbitrary total order computed by the algorithm. The cost of computing this total order and of sorting all of the multisets is linear in the sum of the number of elements in each of the multisets. Our algorithms are based on a sequential pointer RAM model of computation [4, 7], which accesses and stores data using pointers but disallows address arithmetic (which precludes direct access into arrays). This improves on previous algorithms used to solve the related reading problem in SETL [3, 6]. Those algorithms used hashing even for deeply nested data to detect duplicate values. If we assume that hashing unit-space data takes unit expected time and linear worst case time, then for arbitrary data their algorithm would require linear expected time and quadratic worst case time in the number of symbols in C.,