CPM-2: Large-scale cost-effective pre-trained language models

Zhengyan Zhang , Yuxian Gu , Xu Han , Shengqi Chen , Chaojun Xiao , Zhenbo Sun, Yuan Yao, Fanchao Qi, Jian Guan, Pei Ke, Yanzheng Cai, Guoyang Zeng, Zhixing Tan, Zhiyuan Liu, Minlie Huang, Wentao Han, Yang Liu, Xiaoyan Zhu, Maosong Sun
{"title":"CPM-2: Large-scale cost-effective pre-trained language models","authors":"Zhengyan Zhang ,&nbsp;Yuxian Gu ,&nbsp;Xu Han ,&nbsp;Shengqi Chen ,&nbsp;Chaojun Xiao ,&nbsp;Zhenbo Sun,&nbsp;Yuan Yao,&nbsp;Fanchao Qi,&nbsp;Jian Guan,&nbsp;Pei Ke,&nbsp;Yanzheng Cai,&nbsp;Guoyang Zeng,&nbsp;Zhixing Tan,&nbsp;Zhiyuan Liu,&nbsp;Minlie Huang,&nbsp;Wentao Han,&nbsp;Yang Liu,&nbsp;Xiaoyan Zhu,&nbsp;Maosong Sun","doi":"10.1016/j.aiopen.2021.12.003","DOIUrl":null,"url":null,"abstract":"<div><p>In recent years, the size of pre-trained language models (PLMs) has grown by leaps and bounds. However, efficiency issues of these large-scale PLMs limit their utilization in real-world scenarios. We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference. (1) We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch. (2) We explore the best practice of prompt tuning with large-scale PLMs. Compared with conventional fine-tuning, prompt tuning significantly reduces the number of task-specific parameters. (3) We implement a new inference toolkit, namely <span>infmoe</span>, for using large-scale PLMs with limited computational resources. Based on our cost-effective pipeline, we pre-train two models: an encoder-decoder bilingual model with 11 billion parameters (CPM-2) and its corresponding MoE version with 198 billion parameters. In our experiments, we compare CPM-2 with mT5 on downstream tasks. Experimental results show that CPM-2 has excellent general language intelligence. Moreover, we validate the efficiency of <span>infmoe</span> when conducting inference of large-scale models having tens of billions of parameters on a single GPU. All source code and model parameters are available at <span>https://github.com/TsinghuaAI/CPM</span><svg><path></path></svg>.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 216-224"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651021000310/pdfft?md5=46efc536c128aefd0ff69139f8627ddb&pid=1-s2.0-S2666651021000310-main.pdf","citationCount":"59","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI Open","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666651021000310","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 59

Abstract

In recent years, the size of pre-trained language models (PLMs) has grown by leaps and bounds. However, efficiency issues of these large-scale PLMs limit their utilization in real-world scenarios. We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference. (1) We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch. (2) We explore the best practice of prompt tuning with large-scale PLMs. Compared with conventional fine-tuning, prompt tuning significantly reduces the number of task-specific parameters. (3) We implement a new inference toolkit, namely infmoe, for using large-scale PLMs with limited computational resources. Based on our cost-effective pipeline, we pre-train two models: an encoder-decoder bilingual model with 11 billion parameters (CPM-2) and its corresponding MoE version with 198 billion parameters. In our experiments, we compare CPM-2 with mT5 on downstream tasks. Experimental results show that CPM-2 has excellent general language intelligence. Moreover, we validate the efficiency of infmoe when conducting inference of large-scale models having tens of billions of parameters on a single GPU. All source code and model parameters are available at https://github.com/TsinghuaAI/CPM.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CPM-2:大规模经济高效的预训练语言模型
近年来,预训练语言模型(plm)的规模突飞猛进。然而,这些大规模plm的效率问题限制了它们在实际场景中的利用。我们提出了一套具有成本效益的技术,用于使用plm来处理预训练,微调和推理的效率问题。(1)引入知识继承,利用已有的plm来加速预训练过程,而不是从零开始训练模型。(2)我们探索了大规模plm快速调优的最佳实践。与传统的微调相比,提示调优显著减少了特定于任务的参数的数量。(3)我们实现了一个新的推理工具箱,即infmoe,用于在有限的计算资源下使用大规模plm。基于我们的高效管道,我们预训练了两个模型:一个具有110亿个参数的编码器-解码器双语模型(CPM-2)和其对应的具有1980亿个参数的MoE版本。在我们的实验中,我们比较了CPM-2和mT5在下游任务中的作用。实验结果表明,CPM-2具有优异的通用语言智能。此外,我们在单个GPU上对具有数百亿参数的大规模模型进行推理时验证了infmoe的效率。所有源代码和模型参数可从https://github.com/TsinghuaAI/CPM获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
45.00
自引率
0.00%
发文量
0
期刊最新文献
GPT understands, too Adaptive negative representations for graph contrastive learning PM2.5 forecasting under distribution shift: A graph learning approach Enhancing neural network classification using fractional-order activation functions CPT: Colorful Prompt Tuning for pre-trained vision-language models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1