“Crash-Only Software”, 2003-05-01 (; backlinks; similar):
Crash-only programs crash safely and recover quickly. There is only one way to stop such software—by crashing it—and only one way to bring it up—by initiating recovery. Crash-only systems are built from crash-only components, and the use of transparent component-level retries hides intra-system component crashes from end users.
In this paper we advocate a crash-only design for Internet systems, showing that it can lead to more reliable, predictable code and faster, more effective recovery.
We present ideas on how to build such crash-only Internet services, taking successful techniques to their logical extreme.
View PDF: