Bibliography (4):

  1. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

  2. Attention Is All You Need

  3. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

  4. Wikipedia Bibliography:

    1. Convolutional neural network