āGrokking Group Multiplication With Cosetsā, 2023-12-11 ()ā :
The complex and unpredictable nature of deep neural networks prevents their safe use in many high-stakes applications. There have been many techniques developed to interpret deep neural networks, but all have substantial limitations.
Algorithmic tasks have proven to be a fruitful test ground for interpreting a neural network through an end-to-end approach. Building on previous work, we completely reverse engineer fully connected one-hidden layer networks that have āgrokkedā the arithmetic of the permutation groups S5 and S6.
The models discover the true subgroup structure of the full group and converge on neural circuits that decompose the group arithmetic using the permutation groupās subgroups. We relate how we reverse engineered the modelās mechanisms and confirmed our theory was a faithful description of the circuitās functionality.
We also draw attention to current challenges in conducting interpretability research by comparing our work to et al 2023 which alleges to find a different algorithm for this same problem.
ā¦We succeed in completely reverse engineering the model and enumerating the diverse circuits that it converges on to implement the multiplication of the symmetric group.
Our work does not, however, represent an unmitigated success for the project of mechanistic interpretability. The prior work of et al 2023 studied the exact same model and setting, but came to completely different conclusions. Understanding why our and ās interpretations of the same data diverged required extensive effort (see Appendix 7 for a thorough comparison).
We find that even in a setting as simple and well understood as group arithmetic, it is incredibly difficult to do interpretability research and be confident about oneās conclusions.