Home

Fault-tolerant Parallel Processing Combining Linda Checkpointing and Transactions


Author(s) : Karpjoo Jeong Karpjoo Jeong, 
Publisher : N/A
Publication Date : 1996
ISSN : N/A
Abstract : With the advent of high performance workstations and fast LANs, networks of workstations have recently emerged as a promising computing platform for long-running coarse grain parallel applications. Their advantages are wide availability and costeffectiveness, as compared to massively parallel computers. Long-running computation in the workstation environment, however, requires both fault tolerance and the effective utilization of idle workstations. In this dissertation, we present a variant of Linda, called Persistent Linda (PLinda), that treats these two issues uniformly: specifically, PLinda treats non-idleness as failure. PLinda provides a combination of checkpointing and transaction support on both data and program state (an encoding of continuations). The traditional transaction model is optimized and extended to support robust parallel computation. Treatable failures include processor and main memory hard and slowdown failures, and network omission and corruption failures. The programmer can customize fault tolerance when constructing an application, trading failure-free performance against recovery time. When creating a PLinda program,,