Abstract
High availability is essential to heterogeneous computer networks, which are the basis of many systems ranging from the Internet to fly-by-wire flight controls. Development of highly available systems, however, is constrained by ever shorter times to market and the availability of off-the-shelf hardware and software (see the “Examples” box). Consequently, the economic necessity of using commodity products from different vendors puts a premium on the products’ fault tolerance. The development of fault-tolerant and portable software, particularly for parallel and distributed systems consisting of networks of binary-incompatible machines, continues to challenge engineers.