Bibliography (5):
Pay Attention to MLPs
Attention Is All You Need
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Wikipedia Bibliography:
https://en.wikipedia.org/wiki/Natural_language_processing :
https://en.wikipedia.org/wiki/Natural_language_processing