Process Introspection: A Checkpoint Mechanism for High Performance Heterogeneous Distributed Systems

Author:Ferrari, Adam, Department of Computer ScienceUniversity of Virginia

The Process Introspection project is a design and implementation effort, the main goal of which is to construct a general purpose, flexible, efficient checkpoint/restart mechanism appropriate for use in high performance heterogeneous distributed systems. This checkpoint/restart mechanism has the primary constraint that it must be platform independent; that is, checkpoints produced on one architecture or operating system platform must be restartable on a different architecture or operating system platform. The Process Introspection mechanism is based on a design pattern for constructing interoperable checkpointable modules. Application of the design pattern is automated by two levels of software tools: a library of support routines that facilitate the use of the design pattern, and a source code translator that automatically applies the pattern to platform independent modules. A prototype implementation of library has been constructed and used to demonstrate that the design pattern can be applied effectively to construct platform independent checkpointable programs that operate efficiently.

All rights reserved (no additional license for public reuse)
Source Citation:

Ferrari, Adam. "Process Introspection: A Checkpoint Mechanism for High Performance Heterogeneous Distributed Systems." University of Virginia Dept. of Computer Science Tech Report (1996).

University of Virginia, Department of Computer Science
Published Date: