Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation

IF 4.2 1区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Transactions of the Association for Computational Linguistics Pub Date : 2023-04-24 DOI:10.1162/tacl_a_00582

Fei Huang, Pei Ke, Minlie Huang

{"title":"Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation","authors":"Fei Huang, Pei Ke, Minlie Huang","doi":"10.1162/tacl_a_00582","DOIUrl":null,"url":null,"abstract":"Abstract Non-AutoRegressive (NAR) text generation models have drawn much attention because of their significantly faster decoding speed and good generation quality in machine translation. However, in a wider range of text generation tasks, existing NAR models lack proper pre-training, making them still far behind the pre-trained autoregressive models. In this paper, we propose Pre-trained Directed Acyclic Transformer (PreDAT) and a novel pre-training task to promote prediction consistency in NAR generation. Experiments on five text generation tasks show that our PreDAT remarkably outperforms existing pre-trained NAR models (+4.2 score on average) and even achieves better results than pre-trained autoregressive baselines in n-gram-based metrics, along with 17 times speedup in throughput. Further analysis shows that PreDAT benefits from the unbiased prediction order that alleviates the error accumulation problem in autoregressive generation, which provides new insights into the advantages of NAR generation.1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"941-959"},"PeriodicalIF":4.2000,"publicationDate":"2023-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transactions of the Association for Computational Linguistics","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1162/tacl_a_00582","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 3

Abstract

Abstract Non-AutoRegressive (NAR) text generation models have drawn much attention because of their significantly faster decoding speed and good generation quality in machine translation. However, in a wider range of text generation tasks, existing NAR models lack proper pre-training, making them still far behind the pre-trained autoregressive models. In this paper, we propose Pre-trained Directed Acyclic Transformer (PreDAT) and a novel pre-training task to promote prediction consistency in NAR generation. Experiments on five text generation tasks show that our PreDAT remarkably outperforms existing pre-trained NAR models (+4.2 score on average) and even achieves better results than pre-trained autoregressive baselines in n-gram-based metrics, along with 17 times speedup in throughput. Further analysis shows that PreDAT benefits from the unbiased prediction order that alleviates the error accumulation problem in autoregressive generation, which provides new insights into the advantages of NAR generation.1

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

高质量非自回归文本生成的有向无环变压器预训练

摘要在机器翻译中，非自回归（NAR）文本生成模型由于其显著较快的解码速度和良好的生成质量而备受关注。然而，在更广泛的文本生成任务中，现有的NAR模型缺乏适当的预训练，这使得它们仍然远远落后于预训练的自回归模型。在本文中，我们提出了预训练的有向无循环变换器（PreDAT）和一种新的预训练任务，以提高NAR生成中的预测一致性。在五个文本生成任务上的实验表明，我们的PreDAT显著优于现有的预训练的NAR模型（平均得分+4.2），甚至在基于n-gram的度量中取得了比预训练的自回归基线更好的结果，吞吐量提高了17倍。进一步的分析表明，PreDAT受益于无偏预测顺序，它缓解了自回归生成中的误差累积问题，这为NAR生成的优势提供了新的见解。1

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Transactions of the Association for Computational Linguistics Multiple-

CiteScore

32.60

自引率

4.60%

发文量

审稿时长

8 weeks

期刊介绍： The highly regarded quarterly journal Computational Linguistics has a companion journal called Transactions of the Association for Computational Linguistics. This open access journal publishes articles in all areas of natural language processing and is an important resource for academic and industry computational linguists, natural language processing experts, artificial intelligence and machine learning investigators, cognitive scientists, speech specialists, as well as linguists and philosophers. The journal disseminates work of vital relevance to these professionals on an annual basis.