The immense hardware complexity of modern computers, both mobile phones and datacenter servers, is a seemingly endless source of bugs and vulnerabilities in system software.
Classical OSes cannot address this, since they only run on a small subset of the machine. The issue is interactions within the entire ensemble of firmware blobs, co-processors, and CPUs that we term the de facto OS. The current “Whack-a-mole” approach will not solve this problem, nor will clean-slate redesign: it is simply not possible to replace some firmware components and the engineering effort is too great.
Our response, instead, is to build a high-level model of exactly what a given real hardware and software platform consists of, and captures for the first time the necessary and assumed trust relationships between the software contexts executing on different components (CPUs, devices, etc.).
This principled but pragmatic approach allows us to make rigorous statements about the hodgepodge of software & firmware at the heart of modern computers.
We expect these statements to be, at first, depressingly weak, but it may be the only way to identify changes that provably increase the trustworthiness of a real system, and quantify the benefits of these changes.
…2.1 The growing threat of cross-SoC bugs: Protection in a modern computer system is about much more than programming the MMU correctly to ensure isolation between different user processes and the kernel: the OS must interact with hundreds of devices that can access arbitrary memory locations via DMA and increasingly have their own processors, running their own system software. Mutual trust between such devices and their drivers can lead to serious problems, and despite the existence of IOMMUs or System MMUs, new “cross-SoC” attacks which rely on compromising an intelligent device are regularly demonstrated.
Some exploit incorrect or incomplete IOMMU configuration,14, 17 while others exploit subtle features of how data structures are shared with peripheral devices.16 Preventing such bugs by verification seems hard: many years after its introduction, seL4’s correctness proofs either rely on the absence of DMA devices, or assume they are trusted.20
Often, device firmware is much less rigorously engineered than, say, the Linux kernel, and less likely to be updated. Remote code execution vulnerabilities have been demonstrated for many of the Wi-Fi chips in mobile devices;4, 11, 21 all exploit buffer overflows using specially crafted packets. OS kernel’s mitigations like Kernel Address Space Layout Randomization (KASLR) and Executable Space Protection (ESP)13 are often missing from peripherals.4
Figure 1: A cross-SoC attack vulnerability.
Figure 1 shows this: a Wi-Fi Digital Signal Processor (DSP) is compromised over the air, and a further bug in the device driver allows arbitrary RAM to be mapped to the Wi-Fi DSP via the IOMMU, allowing the DSP firmware to access the CPU kernel’s private memory and compromise it.
The authors of these attacks all suggest that this is likely the tip of the iceberg for these kinds of problems. Classenet al2022 show exploits spreading between peripherals without involving the OS kernel on the CPU,8 using buffers shared between Bluetooth and Wi-Fi chips to attack one from the other. In Figure 1, compromised Bluetooth firmware can in turn compromise the Wi-Fi firmware since it can access on-chip RAM containing the firmware.
The current software response is to fix each particular bug in the device driver that allowed the exploit to spread to the main CPU and move on—a game of “Whack-a-mole” that results in every new bug opening a window of vulnerability in a large number of deployed devices. With new hardware appearing all the time, this approach is doing nothing to make the problem go away.
…2.3 How did we get here? The current state of affairs, where almost every production computer runs a de facto OS which nobody designs, and whose behavior and functionality cannot be specified, has come about due to a set of inter-dependent factors and trends.