Bibliography (5):

  1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  2. MASS: Masked Sequence to Sequence Pre-training for Language Generation

  3. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

  4. Attention Is All You Need

  5. Wikipedia Bibliography:

    1. Convolutional neural network