|
Abstract : |
A fundamental challenge for parallel computing is to obtain high-level, architecture independent, algorithms which execute efficiently on general-purpose parallel machines. With the emergence of message passing standards such as MPI, it has become easier to design efficient and portable parallel algorithms by making use of these communication primitives. While existing primitives allow an assortment of collective communication routines, they do not handle an important communication event when most or all processors have nonuniformly sized personalized messages to exchange with each other. We first present an algorithm for the h-relation personalized communicationwhose efficient implementation will allow high performance implementations of a large class of algorithms. We then consider how to effectively use these communication primitives to address the problem of sorting. Previous schemes for sorting on general-purpose parallel machines have had to choose between poor load balancing and irregular communication or multiple rounds of all-to-all personalized communication. In this paper, we introduce a novel variation on sample sort which uses only two rounds of regular all-to-all personalized communication in a scheme that yields very good load balancing with virtually no overhead. Another variation using regular sampling for choosing the splitters has similar performance with deterministic guaranteed bounds on the memory and communication requirements. Both of these variations efficiently handle the presence of duplicates without the overhead of tagging each element. The personalized communication and sorting algorithms presented in this paper have been coded in Split-C and run on a variety of platforms, including the Thinking Machines, |