Bibliography (4):
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Attention Is All You Need
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Wikipedia Bibliography:
Convolutional neural network