TextBox 2.0: A Text Generation Library with Pre-trained Language Models

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing Pub Date : 2022-12-26 DOI:10.48550/arXiv.2212.13005

Tianyi Tang, Junyi Li, Z. Chen, Yiwen Hu, Zhuohao Yu, Wen-Dao Dai, Zican Dong, Xiaoxue Cheng, Yuhao Wang, Wayne Xin Zhao, J. Nie, Ji-rong Wen

{"title":"TextBox 2.0: A Text Generation Library with Pre-trained Language Models","authors":"Tianyi Tang, Junyi Li, Z. Chen, Yiwen Hu, Zhuohao Yu, Wen-Dao Dai, Zican Dong, Xiaoxue Cheng, Yuhao Wang, Wayne Xin Zhao, J. Nie, Ji-rong Wen","doi":"10.48550/arXiv.2212.13005","DOIUrl":null,"url":null,"abstract":"To facilitate research on text generation, this paper presents a comprehensive and unified library, TextBox 2.0, focusing on the use of pre-trained language models (PLMs). To be comprehensive, our library covers $13$ common text generation tasks and their corresponding $83$ datasets and further incorporates $45$ PLMs covering general, translation, Chinese, dialogue, controllable, distilled, prompting, and lightweight PLMs. We also implement $4$ efficient training strategies and provide $4$ generation objectives for pre-training new PLMs from scratch. To be unified, we design the interfaces to support the entire research pipeline (from data loading to training and evaluation), ensuring that each step can be fulfilled in a unified way. Despite the rich functionality, it is easy to use our library, either through the friendly Python API or command line. To validate the effectiveness of our library, we conduct extensive experiments and exemplify four types of research scenarios. The project is released at the link: https://github.com/RUCAIBox/TextBox.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"452 1","pages":"435-444"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2212.13005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

To facilitate research on text generation, this paper presents a comprehensive and unified library, TextBox 2.0, focusing on the use of pre-trained language models (PLMs). To be comprehensive, our library covers $13$ common text generation tasks and their corresponding $83$ datasets and further incorporates $45$ PLMs covering general, translation, Chinese, dialogue, controllable, distilled, prompting, and lightweight PLMs. We also implement $4$ efficient training strategies and provide $4$ generation objectives for pre-training new PLMs from scratch. To be unified, we design the interfaces to support the entire research pipeline (from data loading to training and evaluation), ensuring that each step can be fulfilled in a unified way. Despite the rich functionality, it is easy to use our library, either through the friendly Python API or command line. To validate the effectiveness of our library, we conduct extensive experiments and exemplify four types of research scenarios. The project is released at the link: https://github.com/RUCAIBox/TextBox.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

TextBox 2.0:一个带有预训练语言模型的文本生成库

为了促进文本生成的研究，本文提出了一个全面统一的库TextBox 2.0，重点关注预训练语言模型(plm)的使用。为了全面，我们的库涵盖了13美元的常见文本生成任务及其相应的83美元数据集，并进一步纳入了45美元的plm，包括通用plm、翻译plm、中文plm、对话plm、可控plm、蒸馏plm、提示plm和轻量级plm。我们还实施了4美元的高效培训策略，并为从头开始的新plm预培训提供了4美元的生成目标。为了实现统一，我们设计了支持整个研究流程(从数据加载到培训和评估)的接口，确保每个步骤都可以统一完成。尽管功能丰富，但通过友好的Python API或命令行，很容易使用我们的库。为了验证我们的图书馆的有效性，我们进行了广泛的实验，并举例说明了四种类型的研究场景。该项目发布在链接:https://github.com/RUCAIBox/TextBox。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

自引率

0.00%

发文量