Application Error Recovery in Critical Information Systems
ReportCritical infrastructure applications provide services upon which society depends heavily; these applications are themselves dependent on distributed information systems for all aspects of their operation and so survivability of the information systems is an important issue. Fault tolerance is a key mechanism by which survivability can be achieved in these information sys- teems. Fault tolerance consists of two primary stages, error detection and error recovery; in this paper we focus on application error recovery in these critical information systems. We outline a specification-based approach to error recovery that enables systematic structuring of error recovery specifications, an implementation partially synthesized from the formal specification, and various forms of static and run-time analysis. We present the RAPTOR specification nota- tion for describing error recovery activities in the face of various faults, and we explore synthe- sis of implementation code using the Error Recovery Translator. We also describe a novel implementation architecture enabling error recovery in these systems and discuss issues in analysis.
All rights reserved (no additional license for public reuse)
English
Knight, John, and Matthew Elder. "Application Error Recovery in Critical Information Systems." University of Virginia Dept. of Computer Science Tech Report (2000).
University of Virginia, Department of Computer Science
2000