“Variational Dropout Sparsifies Deep Neural Networks”, Dmitry Molchanov, Arsenii Ashukha, Dmitry Vetrov2017-01-19 (; similar)⁠:

We explore a recently proposed Variational Dropout technique that provided an elegant Bayesian interpretation to Gaussian Dropout.

We extend Variational Dropout to the case when dropout rates are unbounded, propose a way to reduce the variance of the gradient estimator and report first experimental results with individual dropout rates per weight.

Interestingly, it leads to extremely sparse solutions both in fully-connected and convolutional layers. This effect is similar to automatic relevance determination effect in empirical Bayes but has a number of advantages.

We reduce the number of parameters up to 280× on LeNet architectures and up to 68× on VGG-like networks with a negligible decrease of accuracy.