Application Error Recovery in Critical Information Systems

Authors:Knight, John, Department of Computer ScienceUniversity of Virginia Elder, Matthew, Department of Computer ScienceUniversity of Virginia

Critical infrastructure applications provide services upon which society depends heavily; these applications are themselves dependent on distributed information systems for all aspects of their operation and so survivability of the information systems is an important issue. Fault tolerance is a key mechanism by which survivability can be achieved in these information sys- teems. Fault tolerance consists of two primary stages, error detection and error recovery; in this paper we focus on application error recovery in these critical information systems. We outline a specification-based approach to error recovery that enables systematic structuring of error recovery specifications, an implementation partially synthesized from the formal specification, and various forms of static and run-time analysis. We present the RAPTOR specification nota- tion for describing error recovery activities in the face of various faults, and we explore synthe- sis of implementation code using the Error Recovery Translator. We also describe a novel implementation architecture enabling error recovery in these systems and discuss issues in analysis.

All rights reserved (no additional license for public reuse)
Source Citation:

Knight, John, and Matthew Elder. "Application Error Recovery in Critical Information Systems." University of Virginia Dept. of Computer Science Tech Report (2000).

University of Virginia, Department of Computer Science
Published Date: