On the Generalization for Transfer Learning: An Information-Theoretic Analysis

IF 2.2 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Transactions on Information Theory Pub Date : 2024-08-14 DOI:10.1109/TIT.2024.3441574
Xuetong Wu;Jonathan H. Manton;Uwe Aickelin;Jingge Zhu
{"title":"On the Generalization for Transfer Learning: An Information-Theoretic Analysis","authors":"Xuetong Wu;Jonathan H. Manton;Uwe Aickelin;Jingge Zhu","doi":"10.1109/TIT.2024.3441574","DOIUrl":null,"url":null,"abstract":"Transfer learning, or domain adaptation, is concerned with machine learning problems in which training and testing data come from possibly different probability distributions. In this work, we give an information-theoretic analysis of the generalization error and excess risk of transfer learning algorithms. Our results suggest, perhaps as expected, that the Kullback-Leibler (KL) divergence \n<inline-formula> <tex-math>$D(\\mu \\|\\mu ')$ </tex-math></inline-formula>\n plays an important role in the characterizations where \n<inline-formula> <tex-math>$\\mu $ </tex-math></inline-formula>\n and \n<inline-formula> <tex-math>$\\mu '$ </tex-math></inline-formula>\n denote the distribution of the training data and the testing data, respectively. Specifically, we provide generalization error and excess risk upper bounds for learning algorithms where data from both distributions are available in the training phase. Recognizing that the bounds could be sub-optimal in general, we provide improved excess risk upper bounds for a certain class of algorithms, including the empirical risk minimization (ERM) algorithm, by making stronger assumptions through the central condition. To demonstrate the usefulness of the bounds, we further extend the analysis to the Gibbs algorithm and the noisy stochastic gradient descent method. We then generalize the mutual information bound with other divergences such as \n<inline-formula> <tex-math>$\\phi $ </tex-math></inline-formula>\n-divergence and Wasserstein distance, which may lead to tighter bounds and can handle the case when \n<inline-formula> <tex-math>$\\mu $ </tex-math></inline-formula>\n is not absolutely continuous with respect to \n<inline-formula> <tex-math>$\\mu '$ </tex-math></inline-formula>\n. Several numerical results are provided to demonstrate our theoretical findings. Lastly, to address the problem that the bounds are often not directly applicable in practice due to the absence of the distributional knowledge of the data, we develop an algorithm (called InfoBoost) that dynamically adjusts the importance weights for both source and target data based on certain information measures. The empirical results show the effectiveness of the proposed algorithm.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"70 10","pages":"7089-7124"},"PeriodicalIF":2.2000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Theory","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10636241/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Transfer learning, or domain adaptation, is concerned with machine learning problems in which training and testing data come from possibly different probability distributions. In this work, we give an information-theoretic analysis of the generalization error and excess risk of transfer learning algorithms. Our results suggest, perhaps as expected, that the Kullback-Leibler (KL) divergence $D(\mu \|\mu ')$ plays an important role in the characterizations where $\mu $ and $\mu '$ denote the distribution of the training data and the testing data, respectively. Specifically, we provide generalization error and excess risk upper bounds for learning algorithms where data from both distributions are available in the training phase. Recognizing that the bounds could be sub-optimal in general, we provide improved excess risk upper bounds for a certain class of algorithms, including the empirical risk minimization (ERM) algorithm, by making stronger assumptions through the central condition. To demonstrate the usefulness of the bounds, we further extend the analysis to the Gibbs algorithm and the noisy stochastic gradient descent method. We then generalize the mutual information bound with other divergences such as $\phi $ -divergence and Wasserstein distance, which may lead to tighter bounds and can handle the case when $\mu $ is not absolutely continuous with respect to $\mu '$ . Several numerical results are provided to demonstrate our theoretical findings. Lastly, to address the problem that the bounds are often not directly applicable in practice due to the absence of the distributional knowledge of the data, we develop an algorithm (called InfoBoost) that dynamically adjusts the importance weights for both source and target data based on certain information measures. The empirical results show the effectiveness of the proposed algorithm.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
论迁移学习的泛化:信息理论分析
迁移学习或领域适应涉及机器学习问题,其中训练数据和测试数据可能来自不同的概率分布。在这项工作中,我们对迁移学习算法的泛化误差和超额风险进行了信息理论分析。我们的结果表明,也许正如所料,库尔巴克-莱布勒(KL)分歧 $D(\mu \|\mu ')$ 在表征中起着重要作用,其中 $\mu $ 和 $\mu '$ 分别表示训练数据和测试数据的分布。具体来说,我们为学习算法提供了泛化误差和超额风险上限,在训练阶段两种分布的数据都可用。由于认识到这些界限在一般情况下可能是次优的,我们通过中心条件做出更强的假设,为包括经验风险最小化(ERM)算法在内的某类算法提供了改进的超额风险上限。为了证明边界的实用性,我们进一步将分析扩展到吉布斯算法和噪声随机梯度下降法。然后,我们用其他发散(如$\phi $ -发散和瓦瑟斯坦距离)来概括互信息约束,这可能会导致更严格的约束,并能处理 $\mu $ 相对于 $\mu '$ 不是绝对连续的情况。我们提供了一些数值结果来证明我们的理论发现。最后,为了解决由于缺乏数据分布知识而导致边界在实际中无法直接应用的问题,我们开发了一种算法(称为 InfoBoost),它可以根据某些信息度量动态调整源数据和目标数据的重要性权重。实证结果表明了所提算法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory 工程技术-工程:电子与电气
CiteScore
5.70
自引率
20.00%
发文量
514
审稿时长
12 months
期刊介绍: The IEEE Transactions on Information Theory is a journal that publishes theoretical and experimental papers concerned with the transmission, processing, and utilization of information. The boundaries of acceptable subject matter are intentionally not sharply delimited. Rather, it is hoped that as the focus of research activity changes, a flexible policy will permit this Transactions to follow suit. Current appropriate topics are best reflected by recent Tables of Contents; they are summarized in the titles of editorial areas that appear on the inside front cover.
期刊最新文献
Table of Contents IEEE Transactions on Information Theory Publication Information IEEE Transactions on Information Theory Information for Authors Large and Small Deviations for Statistical Sequence Matching Derivatives of Entropy and the MMSE Conjecture
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1