“Is Programmable Overhead Worth The Cost? How Much Do We Pay for a System to Be Programmable? It Depends upon Who You Ask”, Brian Bailey2022-01-13 (, , ; backlinks; similar)⁠:

In his 2021-12-07 DAC keynote “GPUs, Machine Learning, and EDA”, Bill Dally, chief scientist and senior VP of research at Nvidia, compared some of the processors his company has developed with custom accelerators for AI. “The overhead of fetching and decoding, all the overhead of programming, of having a programmable engine, is on the order of 10%–20%—small enough that there’s really no gain to a specialized accelerator. You get at best 20% more performance and lose all the advantages and flexibility that you get by having a programmable engine”, he said.

Later in his talk he broke this down into a little more detail:

If you are doing a single half-precision floating-point multiply/add (HFMA), which is where we started with Volta, your energy per operation is about 1.5 picojoules, and your overhead is 30 picojoules [see Figure 2]. You’ve got a 20× overhead. You’re spending 20× as much energy on the general administration than you are in the engineering department. But if you start amortizing (using more complex instructions), you get to only 5× with the dot product instruction, 20% with the half-precision matrix multiply accumulate (HMMA), and 16% for the integer multiply accumulate (IMMA).

At that point, the advantages of programmability are so large, there’s no point making a dedicated accelerator. You’re much better off building a general-purpose programmable engine, like a GPU, and having some instructions you accelerate.

Figure 2: Specialized Instructions Amortize Overhead

That does not sit well with many people, and it certainly is not reflected by the billions of venture capital flowing into AI accelerators.

[Keynote summary:

“GPU-accelerated computing and machine learning (ML) have revolutionized computer graphics, computer vision, speech recognition, and natural language processing. We expect ML and GPU-accelerated computing will also transform EDA software and as a result, chip design workflows. Recent research shows that orders of magnitudes of speedups are possible with accelerated computing platforms and that the combination of GPUs and ML can enable automation on tasks previously seen as intractable or too difficult to automate.

This talk will cover near-term applications of GPUs and ML to EDA tools and chip design as well as a long term vision of what is possible.

The talk will also cover advances in GPUs and ML-hardware that are enabling this revolution.”]