“Z-Code++: A Pre-Trained Language Model Optimized for Abstractive Summarization”, 2022-08-21 ():
This paper presents Z-Code++ [Z-Code & GODEL, part of Project Z-Code], a new pre-trained language model optimized for abstractive text summarization. The model extends the state-of-the-art encoder-decoder model using 3 techniques.
First, we use a two-phase pre-training process to improve model’s performance on low-resource summarization tasks. The model is first pre-trained using text corpora for language understanding, and then is continually pre-trained on summarization corpora for grounded text generation.
Second, we replace self-attention layers in the encoder with disentangled attention layers, where each word is represented using two vectors that encode its content and position, respectively.
Third, we use fusion-in-encoder, a simple yet effective method of encoding long sequences in a hierarchical manner.
Z-Code++ creates new state-of-the-art on 9⁄13 text summarization tasks across 5 languages. Our model is parameter-efficient in that it outperforms the 600× larger PaLM-540B on XSum, and the finetuned 200× larger GPT-3-175b on SAMSum. In zero-shot and few-shot settings, our model substantially outperforms the competing models.