“The Design Process for Google’s Training Chips: TPUv2 and TPUv3”, Thomas Norrie, Nishant Patil, Doe Hyun Yoon, George Kurian, Sheng Li, James Laudon, Cliff Young, Norman Jouppi, David Patterson2021 (; similar)⁠:

5 years ago, few would have predicted that a software company like Google would build its own computers. Nevertheless, Google has been deploying computers for machine learning (ML) training since 2017, powering key Google services. These Tensor Processing Units (TPUs) are composed of chips, systems, and software, all co-designed in-house.

In this paper, we detail the circumstances that led to this outcome, the challenges and opportunities observed, the approach taken for the chips, a quick review of performance, and finally a retrospective on the results.

A companion paper describes the supercomputers built from these chips, the compiler, and a detailed performance analysis.