âDeep Differentiable Logic Gate Networksâ, 2022-10-15 (; backlinks)â :
[code; music version] Recently, research has increasingly focused on developing efficient neural network architectures. In this work, we explore logic gate networks for machine learning tasks by learning combinations of logic gates. These networks comprise logic gates such as âANDâ and âXORâ, which allow for very fast execution. The difficulty in learning logic gate networks is that they are conventionally non-differentiable and therefore do not allow training with gradient descent.
Thus, to allow for effective training, we propose differentiable logic gate networks, an architecture that combines real-valued logics and a continuously parameterized relaxation of the network.
The resulting discretized logic gate networks achieve fast inference speeds, eg. beyond a million images of MNIST per second on a single CPU core.
âŚLogic gate networks allow for very fast classification, with speeds beyond a million images per second on a single CPU core (for MNIST at > 97.5% accuracy). The computational cost of a layer with n neurons is Î(n) with very small constants (as only logic gates of Booleans are required), while, in comparison, a fully-connected layer (with m input neurons) requires Î(n ¡ m) computations with substantially larger constants (as it requires floating-point arithmetic). While the training can be more expensive than for regular neural networks (however, just by a constant and asymptotically less expensive), to our knowledge, the proposed method is the fastest available architecture at inference time.
Overall, our method accelerates inference speed (in comparison to fully-connected ReLU neural networks) by around two orders of magnitude. In the experiments, we scale the training of logic gate networks up to 5 million parameters, which can be considered relatively small in comparison to other architectures. In comparison to the fastest neural networks at 98.4% on MNIST, our method is more than 12Ă faster than the best binary neural networks and 2â3 orders of magnitude faster than the theoretical speed of sparse neural networks.
[This sounds like it would be extremely expensive to train large-scale networks on because youâre doing a many-way choice for each and every parameter, and each parameter is an extremely weak one (and Iâm not sure about the asymptotic claim there) which doesnât benefit from the inductive bias of any architecture at all, not even the minimal MLP arch. But it might be an ideal way to distill & sparsify a pretrained neural network down into something that can be turned into an absurdly fast, small, energy-efficient ASIC: convert it layer by layer, and then finetune it end-to-end, and then shrink it by pruning gates.]
View PDF: