{"title":"论无监督预训练的泛化能力","authors":"Yuyang Deng, Junyuan Hong, Jiayu Zhou, Mehrdad Mahdavi","doi":"arxiv-2403.06871","DOIUrl":null,"url":null,"abstract":"Recent advances in unsupervised learning have shown that unsupervised\npre-training, followed by fine-tuning, can improve model generalization.\nHowever, a rigorous understanding of how the representation function learned on\nan unlabeled dataset affects the generalization of the fine-tuned model is\nlacking. Existing theoretical research does not adequately account for the\nheterogeneity of the distribution and tasks in pre-training and fine-tuning\nstage. To bridge this gap, this paper introduces a novel theoretical framework\nthat illuminates the critical factor influencing the transferability of\nknowledge acquired during unsupervised pre-training to the subsequent\nfine-tuning phase, ultimately affecting the generalization capabilities of the\nfine-tuned model on downstream tasks. We apply our theoretical framework to\nanalyze generalization bound of two distinct scenarios: Context Encoder\npre-training with deep neural networks and Masked Autoencoder pre-training with\ndeep transformers, followed by fine-tuning on a binary classification task.\nFinally, inspired by our findings, we propose a novel regularization method\nduring pre-training to further enhances the generalization of fine-tuned model.\nOverall, our results contribute to a better understanding of unsupervised\npre-training and fine-tuning paradigm, and can shed light on the design of more\neffective pre-training algorithms.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the Generalization Ability of Unsupervised Pretraining\",\"authors\":\"Yuyang Deng, Junyuan Hong, Jiayu Zhou, Mehrdad Mahdavi\",\"doi\":\"arxiv-2403.06871\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advances in unsupervised learning have shown that unsupervised\\npre-training, followed by fine-tuning, can improve model generalization.\\nHowever, a rigorous understanding of how the representation function learned on\\nan unlabeled dataset affects the generalization of the fine-tuned model is\\nlacking. Existing theoretical research does not adequately account for the\\nheterogeneity of the distribution and tasks in pre-training and fine-tuning\\nstage. To bridge this gap, this paper introduces a novel theoretical framework\\nthat illuminates the critical factor influencing the transferability of\\nknowledge acquired during unsupervised pre-training to the subsequent\\nfine-tuning phase, ultimately affecting the generalization capabilities of the\\nfine-tuned model on downstream tasks. We apply our theoretical framework to\\nanalyze generalization bound of two distinct scenarios: Context Encoder\\npre-training with deep neural networks and Masked Autoencoder pre-training with\\ndeep transformers, followed by fine-tuning on a binary classification task.\\nFinally, inspired by our findings, we propose a novel regularization method\\nduring pre-training to further enhances the generalization of fine-tuned model.\\nOverall, our results contribute to a better understanding of unsupervised\\npre-training and fine-tuning paradigm, and can shed light on the design of more\\neffective pre-training algorithms.\",\"PeriodicalId\":501301,\"journal\":{\"name\":\"arXiv - CS - Machine Learning\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2403.06871\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2403.06871","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On the Generalization Ability of Unsupervised Pretraining
Recent advances in unsupervised learning have shown that unsupervised
pre-training, followed by fine-tuning, can improve model generalization.
However, a rigorous understanding of how the representation function learned on
an unlabeled dataset affects the generalization of the fine-tuned model is
lacking. Existing theoretical research does not adequately account for the
heterogeneity of the distribution and tasks in pre-training and fine-tuning
stage. To bridge this gap, this paper introduces a novel theoretical framework
that illuminates the critical factor influencing the transferability of
knowledge acquired during unsupervised pre-training to the subsequent
fine-tuning phase, ultimately affecting the generalization capabilities of the
fine-tuned model on downstream tasks. We apply our theoretical framework to
analyze generalization bound of two distinct scenarios: Context Encoder
pre-training with deep neural networks and Masked Autoencoder pre-training with
deep transformers, followed by fine-tuning on a binary classification task.
Finally, inspired by our findings, we propose a novel regularization method
during pre-training to further enhances the generalization of fine-tuned model.
Overall, our results contribute to a better understanding of unsupervised
pre-training and fine-tuning paradigm, and can shed light on the design of more
effective pre-training algorithms.