Bibliography (4):
RoBERTa: A Robustly Optimized BERT Pretraining Approach
PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
CPM-2: Large-scale Cost-effective Pre-trained Language Models
Wikipedia Bibliography:
Ceiling effect (statistics)