Pre-trained models: Past, present and future

Xu Han , Zhengyan Zhang , Ning Ding , Yuxian Gu , Xiao Liu , Yuqi Huo , Jiezhong Qiu , Yuan Yao , Ao Zhang , Liang Zhang , Wentao Han , Minlie Huang , Qin Jin , Yanyan Lan , Yang Liu , Zhiyuan Liu , Zhiwu Lu , Xipeng Qiu , Ruihua Song , Jie Tang , Jun Zhu
{"title":"Pre-trained models: Past, present and future","authors":"Xu Han ,&nbsp;Zhengyan Zhang ,&nbsp;Ning Ding ,&nbsp;Yuxian Gu ,&nbsp;Xiao Liu ,&nbsp;Yuqi Huo ,&nbsp;Jiezhong Qiu ,&nbsp;Yuan Yao ,&nbsp;Ao Zhang ,&nbsp;Liang Zhang ,&nbsp;Wentao Han ,&nbsp;Minlie Huang ,&nbsp;Qin Jin ,&nbsp;Yanyan Lan ,&nbsp;Yang Liu ,&nbsp;Zhiyuan Liu ,&nbsp;Zhiwu Lu ,&nbsp;Xipeng Qiu ,&nbsp;Ruihua Song ,&nbsp;Jie Tang ,&nbsp;Jun Zhu","doi":"10.1016/j.aiopen.2021.08.002","DOIUrl":null,"url":null,"abstract":"<div><p>Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success and become a milestone in the field of artificial intelligence (AI). Owing to sophisticated pre-training objectives and huge model parameters, large-scale PTMs can effectively capture knowledge from massive labeled and unlabeled data. By storing knowledge into huge parameters and fine-tuning on specific tasks, the rich knowledge implicitly encoded in huge parameters can benefit a variety of downstream tasks, which has been extensively demonstrated via experimental verification and empirical analysis. It is now the consensus of the AI community to adopt PTMs as backbone for downstream tasks rather than learning models from scratch. In this paper, we take a deep look into the history of pre-training, especially its special relation with transfer learning and self-supervised learning, to reveal the crucial position of PTMs in the AI development spectrum. Further, we comprehensively review the latest breakthroughs of PTMs. These breakthroughs are driven by the surge of computational power and the increasing availability of data, towards four important directions: designing effective architectures, utilizing rich contexts, improving computational efficiency, and conducting interpretation and theoretical analysis. Finally, we discuss a series of open problems and research directions of PTMs, and hope our view can inspire and advance the future study of PTMs.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 225-250"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.aiopen.2021.08.002","citationCount":"351","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI Open","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666651021000231","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 351

Abstract

Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success and become a milestone in the field of artificial intelligence (AI). Owing to sophisticated pre-training objectives and huge model parameters, large-scale PTMs can effectively capture knowledge from massive labeled and unlabeled data. By storing knowledge into huge parameters and fine-tuning on specific tasks, the rich knowledge implicitly encoded in huge parameters can benefit a variety of downstream tasks, which has been extensively demonstrated via experimental verification and empirical analysis. It is now the consensus of the AI community to adopt PTMs as backbone for downstream tasks rather than learning models from scratch. In this paper, we take a deep look into the history of pre-training, especially its special relation with transfer learning and self-supervised learning, to reveal the crucial position of PTMs in the AI development spectrum. Further, we comprehensively review the latest breakthroughs of PTMs. These breakthroughs are driven by the surge of computational power and the increasing availability of data, towards four important directions: designing effective architectures, utilizing rich contexts, improving computational efficiency, and conducting interpretation and theoretical analysis. Finally, we discuss a series of open problems and research directions of PTMs, and hope our view can inspire and advance the future study of PTMs.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
预训练模型:过去、现在和未来
BERT和GPT等大规模预训练模型(ptm)近年来取得了巨大的成功,成为人工智能领域的一个里程碑。由于复杂的预训练目标和庞大的模型参数,大规模ptm可以有效地从大量标记和未标记数据中获取知识。通过将知识存储到大参数中,并对特定任务进行微调,隐式编码在大参数中的丰富知识可以造福于各种下游任务,这已经通过实验验证和实证分析得到了广泛的证明。现在AI社区的共识是采用ptm作为下游任务的主干,而不是从头开始学习模型。在本文中,我们深入研究了预训练的历史,特别是它与迁移学习和自监督学习的特殊关系,揭示了ptm在人工智能发展光谱中的重要地位。此外,我们全面回顾了ptm的最新突破。这些突破是由计算能力的激增和数据可用性的增加驱动的,朝着四个重要方向发展:设计有效的架构,利用丰富的上下文,提高计算效率,进行解释和理论分析。最后,我们讨论了PTMs研究中存在的一系列问题和研究方向,希望我们的观点能够对PTMs的未来研究有所启发和推动。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
45.00
自引率
0.00%
发文量
0
期刊最新文献
GPT understands, too Adaptive negative representations for graph contrastive learning PM2.5 forecasting under distribution shift: A graph learning approach Enhancing neural network classification using fractional-order activation functions CPT: Colorful Prompt Tuning for pre-trained vision-language models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1