“Exploring Low Rank Training of Deep Neural Networks”, Siddhartha Rao Kamalakara, Acyr Locatelli, Bharat Venkitesh, Jimmy Ba, Yarin Gal, Aidan N. Gomez2022-09-27 ()⁠:

Training deep neural networks in low rank, i.e. with factorized layers, is of particular interest to the community: it offers efficiency over unfactorized training in terms of both memory consumption and training time.

Prior work has focused on low-rank approximations of pre-trained networks and training in low-rank space with additional objectives, offering various ad hoc explanations for chosen practice.

We analyze techniques that work well in practice, and through extensive ablations on models such as GPT-2 we provide evidence falsifying common beliefs in the field, hinting in the process at exciting research opportunities that still need answering.