“Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition”, Zhongtian Chen, Edmund Lau, Jake Mendel, Susan Wei, Daniel Murfet2023-10-10 (, , )⁠:

[blog] We investigate phase transitions in a Toy Model of Superposition (TMS) using Singular Learning Theory (SLT).

We derive a closed formula for the theoretical loss and, in the case of two hidden dimensions, discover that regular k-gons are critical points.

We present supporting theory indicating that the local learning coefficient (a geometric invariant) of these k-gons determines phase transitions in the Bayesian posterior as a function of training sample size.

We then show empirically that the same k-gon critical points also determine the behavior of SGD training.

The picture that emerges adds evidence to the conjecture that the SGD learning trajectory is subject to a sequential learning mechanism. Specifically, we find that the learning process in TMS, be it through SGD or Bayesian learning, can be characterized by a journey through parameter space from regions of high loss and low complexity to regions of low loss and high complexity.