Scalable and Efficient MoE Training for Multitask Multilingual Models
GODEL: Large-Scale Pre-Training for Goal-Directed Dialog
https://www.microsoft.com/en-us/research/project/project-zcode/
https://arxiv.org/abs/2109.08668
Donβt Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization
https://arxiv.org/abs/2005.02365