-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
-
MASS: Masked Sequence to Sequence Pre-training for Language Generation
-
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
-
Attention Is All You Need
-