Bibliography (6):

  1. https://www.zhihu.com/question/456443707

  2. https://zhuanlan.zhihu.com/p/367666974

  3. GPT-3: Language Models are Few-Shot Learners

  4. PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

  5. Attention Is All You Need