Back To The Future: A Family of Algorithms for Termination Detection in Distributed Systems (Part 1)

A classical problem in distributed systems is detecting the termination of a distributed computation. Distributed Termination Detection (DTD) has been extensively studied in the past twenty years and it is known to be a difficult problem to solve efficiently, because it involves properties of the global state of a distributed system. Many DTD algorithms exists and a recent survey of 35 of them [1] introduces a taxonomy and identifies 8 different characteristics for their classification and evaluation. This survey concludes by remarking that ``[an] algorithm [with favorable ranking in all 8 dimensions,] if one exists, would be a huge development in this field.'' As difficult as DTD is in its classical setting, considerations for dynamicity and mobility in a distributed system further complicate the DTD problem and render most existing DTD algorithms non-applicable.

We introduce the notion of Apparent Causality as a relation among the messages in a system, from which we derive the concepts of message histories and futures. Apparent causality and message histories are inherently local properties which can be evaluated at the level of each process, whereas message futures are inherently global system-level properties. Histories and futures of messages are examples of histories and futures of more general observables in a distributed system. We propose Back To The Future (BTTF) as a generic method for computing futures from histories, and use this technique to construct three different algorithms:

  1. BTTF Transitory Quiescence (BTTF-TQ) is a generic, efficient algorithm that leads a distributed system to a state where there are no pending messages;
  2. Yet Another Wave Algorithm (YAWA) uses the BTTF technique to implement a generic DTD wave algorithm with certain interesting properties of its own; and
  3. BTTF Wave is our main algorithm, which combines BTTF-TQ and YAWA to obtain a general symmetric DTD algorithm that is equally suitable for classical settings as for dynamic systems of distributed mobile processes.

The BTTF Wave algorithm ranks quite favorably in the characterization scheme of [1]. Furthermore, it is generic and is suitable for dynamic and mobile systems at no extra cost. Our preliminary results indicate that while the theoretical worst-case message complexity of the BTTF Wave algorithm is no worse than other generic wave algorithms, such worst-case scenarios are possible only in unrealistic systems. The worst-case message complexity of this algorithm for realistic systems is significantly better, and its average message complexity in realistic systems is only a fraction of their total number of normal messages.

[1] ``A Taxonomy of Distributed Termination Detection Algorithms,'' J. Matocha and T. Camp, The Journal of Systems and Software, vol. 43, pp. 207-221, 1998.  

hosted by