âExploring Low Rank Training of Deep Neural Networksâ, 2022-09-27 ()â :
Training deep neural networks in low rank, i.e. with factorized layers, is of particular interest to the community: it offers efficiency over unfactorized training in terms of both memory consumption and training time.
Prior work has focused on low-rank approximations of pre-trained networks and training in low-rank space with additional objectives, offering various ad hoc explanations for chosen practice.
We analyze techniques that work well in practice, and through extensive ablations on models such as GPT-2 we provide evidence falsifying common beliefs in the field, hinting in the process at exciting research opportunities that still need answering.