Testing sufficiency for transfer learning

IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computational Statistics & Data Analysis Pub Date : 2024-10-16 DOI:10.1016/j.csda.2024.108075
Ziqian Lin , Yuan Gao , Feifei Wang , Hansheng Wang
{"title":"Testing sufficiency for transfer learning","authors":"Ziqian Lin ,&nbsp;Yuan Gao ,&nbsp;Feifei Wang ,&nbsp;Hansheng Wang","doi":"10.1016/j.csda.2024.108075","DOIUrl":null,"url":null,"abstract":"<div><div>Modern statistical analysis often encounters high dimensional models but with limited sample sizes. This makes it difficult to estimate high-dimensional statistical models based on target data with limited sample size. Then how to borrow information from another large sized source data for more accurate target model estimation becomes an interesting problem. This leads to the useful idea of transfer learning. Various estimation methods in this regard have been developed recently. In this work, we study transfer learning from a different perspective. Specifically, we consider here the problem of testing for transfer learning sufficiency. We denote <em>transfer learning sufficiency</em> to be the null hypothesis. It refers to the situation that, with the help of the source data, the useful information contained in the feature vectors of the target data can be sufficiently extracted for predicting the interested target response. Therefore, the rejection of the null hypothesis implies that information useful for prediction remains in the feature vectors of the target data and thus calls for further exploration. To this end, we develop a novel testing procedure and a centralized and standardized test statistic, whose asymptotic null distribution is analytically derived. Simulation studies are presented to demonstrate the finite sample performance of the proposed method. A deep learning related real data example is presented for illustration purpose.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"203 ","pages":"Article 108075"},"PeriodicalIF":1.5000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Statistics & Data Analysis","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167947324001592","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Modern statistical analysis often encounters high dimensional models but with limited sample sizes. This makes it difficult to estimate high-dimensional statistical models based on target data with limited sample size. Then how to borrow information from another large sized source data for more accurate target model estimation becomes an interesting problem. This leads to the useful idea of transfer learning. Various estimation methods in this regard have been developed recently. In this work, we study transfer learning from a different perspective. Specifically, we consider here the problem of testing for transfer learning sufficiency. We denote transfer learning sufficiency to be the null hypothesis. It refers to the situation that, with the help of the source data, the useful information contained in the feature vectors of the target data can be sufficiently extracted for predicting the interested target response. Therefore, the rejection of the null hypothesis implies that information useful for prediction remains in the feature vectors of the target data and thus calls for further exploration. To this end, we develop a novel testing procedure and a centralized and standardized test statistic, whose asymptotic null distribution is analytically derived. Simulation studies are presented to demonstrate the finite sample performance of the proposed method. A deep learning related real data example is presented for illustration purpose.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
测试迁移学习的充分性
现代统计分析经常会遇到高维模型但样本量有限的情况。这就给基于有限样本量的目标数据估计高维统计模型带来了困难。那么,如何从另一个大样本数据中借用信息来更准确地估计目标模型就成了一个有趣的问题。这就产生了迁移学习这一有用的想法。最近,人们在这方面开发出了各种估计方法。在这项工作中,我们从另一个角度研究迁移学习。具体来说,我们在此考虑转移学习充分性的检验问题。我们将转移学习充分性视为零假设。它是指在源数据的帮助下,目标数据的特征向量中包含的有用信息可以被充分提取出来,用于预测感兴趣的目标响应。因此,拒绝零假设意味着目标数据的特征向量中仍然存在对预测有用的信息,因此需要进一步探索。为此,我们开发了一种新颖的检验程序和集中标准化检验统计量,并对其渐近零分布进行了分析推导。仿真研究展示了所提方法的有限样本性能。为了便于说明,还介绍了一个与深度学习相关的真实数据示例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computational Statistics & Data Analysis
Computational Statistics & Data Analysis 数学-计算机:跨学科应用
CiteScore
3.70
自引率
5.60%
发文量
167
审稿时长
60 days
期刊介绍: Computational Statistics and Data Analysis (CSDA), an Official Publication of the network Computational and Methodological Statistics (CMStatistics) and of the International Association for Statistical Computing (IASC), is an international journal dedicated to the dissemination of methodological research and applications in the areas of computational statistics and data analysis. The journal consists of four refereed sections which are divided into the following subject areas: I) Computational Statistics - Manuscripts dealing with: 1) the explicit impact of computers on statistical methodology (e.g., Bayesian computing, bioinformatics,computer graphics, computer intensive inferential methods, data exploration, data mining, expert systems, heuristics, knowledge based systems, machine learning, neural networks, numerical and optimization methods, parallel computing, statistical databases, statistical systems), and 2) the development, evaluation and validation of statistical software and algorithms. Software and algorithms can be submitted with manuscripts and will be stored together with the online article. II) Statistical Methodology for Data Analysis - Manuscripts dealing with novel and original data analytical strategies and methodologies applied in biostatistics (design and analytic methods for clinical trials, epidemiological studies, statistical genetics, or genetic/environmental interactions), chemometrics, classification, data exploration, density estimation, design of experiments, environmetrics, education, image analysis, marketing, model free data exploration, pattern recognition, psychometrics, statistical physics, image processing, robust procedures. [...] III) Special Applications - [...] IV) Annals of Statistical Data Science [...]
期刊最新文献
Editorial Board Stratified distance space improves the efficiency of sequential samplers for approximate Bayesian computation Confidence intervals for tree-structured varying coefficients Efficient computation of sparse and robust maximum association estimators Functional time transformation model with applications to digital health
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1