“Turing-NLG: A 17-Billion-Parameter Language Model by Microsoft”, Corby Rosset2020-02-10 (; backlinks; similar)⁠:

[History of large-parameter natural language neural network models since ELMo (0.094b LSTM, February 2018) to Turing-NLG (17b, February 2020)]

Turing Natural Language Generation (T-NLG) is a 17 billion parameter language model by Microsoft that outperforms the state-of-the-art on many downstream NLP tasks. We present a demo of the model, including its freeform generation, question answering, and summarization capabilities, to academics for feedback and research purposes. <|endoftext|>

—This summary was generated by the Turing-NLG language model itself.

…Following the trend that larger natural language models lead to better results, Microsoft is introducing Turing Natural Language Generation (T-NLG), the largest model ever published at 17 billion parameters, which outperforms the state-of-the-art on a variety of language modeling benchmarks and also excels when applied to numerous practical tasks, including summarization and question answering. This work would not be possible without breakthroughs produced by the DeepSpeed library (compatible with PyTorch) and ZeRO optimizer, which can be explored more in this accompanying blog post.