IMPROVING THE RELIABILITY OF
COMMODITY OPERATING SYSTEMS

Michael Swift, Brian N. Bershad, and Henry M. Levy

University of Washington

Overview

Despite decades of research in extensible operating system technology, extensions such as device drivers remain a significant cause of system failures. In Windows XP, for example, drivers account for 85% of recently reported failures.

Nooks is a reliability subsystem that seeks to greatly enhance OS reliability by isolating the OS from driver failures. The Nooks approach is practical: rather than guaranteeing complete fault tolerance through a new (and incompatible) OS or driver architecture, our goal is to prevent the vast majority of driver-caused crashes with little or no change to existing driver and system code. To achieve this, Nooks isolates drivers within lightweight protection domains inside the kernel address space, where hardware and software prevent them from corrupting the kernel. Nooks also tracks a driver's use of kernel resources to hasten automatic clean-up during recovery

Results

We implemented Nooks for the Linux 2.4.18 kernel. Nooks supports network interface cards, sound cards, the VFAT file system, and the kHTTPd web server. Using synthetic fault-injection, we were able to prevent 99% of crashes that occurred with Nooks' isolation services.