Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization

arXiv - CS - General Literature Pub Date : 2022-08-21 DOI:arxiv-2208.09770

Pengcheng He, Baolin Peng, Liyang Lu, Song Wang, Jie Mei, Yang Liu, Ruochen Xu, Hany Hassan Awadalla, Yu Shi, Chenguang Zhu, Wayne Xiong, Michael Zeng, Jianfeng Gao, Xuedong Huang

{"title":"Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization","authors":"Pengcheng He, Baolin Peng, Liyang Lu, Song Wang, Jie Mei, Yang Liu, Ruochen Xu, Hany Hassan Awadalla, Yu Shi, Chenguang Zhu, Wayne Xiong, Michael Zeng, Jianfeng Gao, Xuedong Huang","doi":"arxiv-2208.09770","DOIUrl":null,"url":null,"abstract":"This paper presents Z-Code++, a new pre-trained language model optimized for\nabstractive text summarization. The model extends the state of the art\nencoder-decoder model using three techniques. First, we use a two-phase\npre-training process to improve model's performance on low-resource\nsummarization tasks. The model is first pre-trained using text corpora for\nlanguage understanding, and then is continually pre-trained on summarization\ncorpora for grounded text generation. Second, we replace self-attention layers\nin the encoder with disentangled attention layers, where each word is\nrepresented using two vectors that encode its content and position,\nrespectively. Third, we use fusion-in-encoder, a simple yet effective method of\nencoding long sequences in a hierarchical manner. Z-Code++ creates new state of\nthe art on 9 out of 13 text summarization tasks across 5 languages. Our model\nis parameter-efficient in that it outperforms the 600x larger PaLM-540B on\nXSum, and the finetuned 200x larger GPT3-175B on SAMSum. In zero-shot and\nfew-shot settings, our model substantially outperforms the competing models.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"4 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - General Literature","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2208.09770","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper presents Z-Code++, a new pre-trained language model optimized for abstractive text summarization. The model extends the state of the art encoder-decoder model using three techniques. First, we use a two-phase pre-training process to improve model's performance on low-resource summarization tasks. The model is first pre-trained using text corpora for language understanding, and then is continually pre-trained on summarization corpora for grounded text generation. Second, we replace self-attention layers in the encoder with disentangled attention layers, where each word is represented using two vectors that encode its content and position, respectively. Third, we use fusion-in-encoder, a simple yet effective method of encoding long sequences in a hierarchical manner. Z-Code++ creates new state of the art on 9 out of 13 text summarization tasks across 5 languages. Our model is parameter-efficient in that it outperforms the 600x larger PaLM-540B on XSum, and the finetuned 200x larger GPT3-175B on SAMSum. In zero-shot and few-shot settings, our model substantially outperforms the competing models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

z - code++:一种为抽象摘要优化的预训练语言模型

本文提出了一种新的针对抽象文本摘要进行优化的预训练语言模型z - code++。该模型使用三种技术扩展了artencoder-decoder模型的状态。首先，我们使用两阶段预训练过程来提高模型在低资源汇总任务上的性能。该模型首先使用文本语料库进行语言理解的预训练，然后继续使用摘要语料库进行基础文本生成的预训练。其次，我们将编码器中的自注意层替换为解纠缠的注意层，其中每个单词分别使用两个向量来表示其内容和位置。第三，我们使用融合编码器，这是一种简单而有效的方法，以分层方式编码长序列。z - code++在5种语言的13个文本摘要任务中的9个上创造了新的技术水平。我们的模型是参数高效的，因为它在xsum上优于600倍大的PaLM-540B，在SAMSum上优于经过微调的200倍大的GPT3-175B。在零射击和少射击设置中，我们的模型实质上优于竞争模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - General Literature

自引率

0.00%

发文量