Home

Simplifying the development of fault-tolerant distributed applications


Author(s) : L. E. Moser P. M. Melliar-smith, 
Publisher : N/A
Publication Date : 1995
ISSN : N/A
Abstract : The primary objective of our work has been to make the programming of faulttolerant distributed systems easier, quicker, and less prone to error. Multicast group communication protocols provide a good foundation upon which to build such systems. Delivery of messages in causal order is useful because it precludes anomalies in which an effect precedes its cause. It is, however, our position that distributed systems are easier to understand and simpler to program when messages are totally ordered rather than causally ordered. If processes receive exactly the same messages in exactly the same total order, they perform exactly the same actions and maintain exactly the same state. In the past, total ordering was substantially more expensive than causal ordering, but that is no longer the case. Modern group communication systems, such as our Totem system [1], can reliably multicast totally ordered messages in local-area networks with throughput as high, and with overheads and latency as low, as the best causally ordered protocols and even as the best point-to-point FIFO protocols. There is no longer any need to incur the more difficult programming of causally ordered messages, let alone of unordered messages. For long distance transcontinental and intercontinental communication, where propagation delays are significant, causally ordered messages may still be appropriate. While the latency for totally ordered messages is now quite good, additional latency is incurred if a message cannot be delivered until all other processes in the group have acknowledged its receipt (we use the term "safe " for this because the database community,