On the Generalization Ability of Unsupervised Pretraining

Yuyang Deng, Junyuan Hong, Jiayu Zhou, Mehrdad Mahdavi
{"title":"On the Generalization Ability of Unsupervised Pretraining","authors":"Yuyang Deng, Junyuan Hong, Jiayu Zhou, Mehrdad Mahdavi","doi":"arxiv-2403.06871","DOIUrl":null,"url":null,"abstract":"Recent advances in unsupervised learning have shown that unsupervised\npre-training, followed by fine-tuning, can improve model generalization.\nHowever, a rigorous understanding of how the representation function learned on\nan unlabeled dataset affects the generalization of the fine-tuned model is\nlacking. Existing theoretical research does not adequately account for the\nheterogeneity of the distribution and tasks in pre-training and fine-tuning\nstage. To bridge this gap, this paper introduces a novel theoretical framework\nthat illuminates the critical factor influencing the transferability of\nknowledge acquired during unsupervised pre-training to the subsequent\nfine-tuning phase, ultimately affecting the generalization capabilities of the\nfine-tuned model on downstream tasks. We apply our theoretical framework to\nanalyze generalization bound of two distinct scenarios: Context Encoder\npre-training with deep neural networks and Masked Autoencoder pre-training with\ndeep transformers, followed by fine-tuning on a binary classification task.\nFinally, inspired by our findings, we propose a novel regularization method\nduring pre-training to further enhances the generalization of fine-tuned model.\nOverall, our results contribute to a better understanding of unsupervised\npre-training and fine-tuning paradigm, and can shed light on the design of more\neffective pre-training algorithms.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2403.06871","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization. However, a rigorous understanding of how the representation function learned on an unlabeled dataset affects the generalization of the fine-tuned model is lacking. Existing theoretical research does not adequately account for the heterogeneity of the distribution and tasks in pre-training and fine-tuning stage. To bridge this gap, this paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase, ultimately affecting the generalization capabilities of the fine-tuned model on downstream tasks. We apply our theoretical framework to analyze generalization bound of two distinct scenarios: Context Encoder pre-training with deep neural networks and Masked Autoencoder pre-training with deep transformers, followed by fine-tuning on a binary classification task. Finally, inspired by our findings, we propose a novel regularization method during pre-training to further enhances the generalization of fine-tuned model. Overall, our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
论无监督预训练的泛化能力
无监督学习的最新进展表明,在无监督预训练之后进行微调,可以提高模型的泛化能力。然而,对于在无标记数据集上学习到的表示函数如何影响微调模型的泛化能力,还缺乏严格的认识。现有的理论研究没有充分考虑到预训练和微调阶段的分布和任务的异质性。为了弥补这一不足,本文提出了一个新颖的理论框架,阐明了影响无监督预训练期间获得的知识向后续微调阶段转移的关键因素,并最终影响微调模型在下游任务上的泛化能力。我们将理论框架应用于分析两种不同场景的泛化边界:最后,受我们研究结果的启发,我们提出了一种新颖的预训练正则化方法,以进一步增强微调模型的泛化能力。总之,我们的研究结果有助于更好地理解无监督预训练和微调范式,并为设计更有效的预训练算法提供启示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features The Impact of Element Ordering on LM Agent Performance Towards Interpretable End-Stage Renal Disease (ESRD) Prediction: Utilizing Administrative Claims Data with Explainable AI Techniques Extended Deep Submodular Functions Symmetry-Enriched Learning: A Category-Theoretic Framework for Robust Machine Learning Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1