Solving the missing at random problem in semi‐supervised learning: An inverse probability weighting method

IF 0.7 4区 数学 Q3 STATISTICS & PROBABILITY Stat Pub Date : 2024-06-23 DOI:10.1002/sta4.707
Jin Su, Shuyi Zhang, Yong Zhou
{"title":"Solving the missing at random problem in semi‐supervised learning: An inverse probability weighting method","authors":"Jin Su, Shuyi Zhang, Yong Zhou","doi":"10.1002/sta4.707","DOIUrl":null,"url":null,"abstract":"We propose an estimator for the population mean under the semi‐supervised learning setting with the Missing at Random (MAR) assumption. This setting assumes that the probability of observing , denoted by , depends on the total sample size and satisfies . To efficiently estimate , we introduce an adaptive estimator based on inverse probability weighting and cross‐fitting. Theoretical analysis reveals that our proposed estimator is consistent and efficient, with a convergence rate of , slower than the typical rate, due to the diminishing proportion of labelled data as the sample size increases in the semi‐supervised setting. We also prove the consistency of inverse probability weighting (IPW)–Nadaraya–Watson density function estimators. Extensive simulations and an application to the Los Angeles homeless data validate the effectiveness of our approach.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"29 1","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2024-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Stat","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1002/sta4.707","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

We propose an estimator for the population mean under the semi‐supervised learning setting with the Missing at Random (MAR) assumption. This setting assumes that the probability of observing , denoted by , depends on the total sample size and satisfies . To efficiently estimate , we introduce an adaptive estimator based on inverse probability weighting and cross‐fitting. Theoretical analysis reveals that our proposed estimator is consistent and efficient, with a convergence rate of , slower than the typical rate, due to the diminishing proportion of labelled data as the sample size increases in the semi‐supervised setting. We also prove the consistency of inverse probability weighting (IPW)–Nadaraya–Watson density function estimators. Extensive simulations and an application to the Los Angeles homeless data validate the effectiveness of our approach.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
解决半监督学习中的随机缺失问题:反概率加权法
我们提出了一种在随机缺失(MAR)假设的半监督学习环境下的总体均值估计方法。在这种情况下,我们假设观测到的概率为 ,表示为 ,取决于样本总量,并满足 。为了有效估计 ,我们引入了一种基于反概率加权和交叉拟合的自适应估计器。理论分析表明,我们提出的估计器具有一致性和高效性,收敛速度为 ,低于典型的收敛速度,这是由于在半监督设置中,随着样本量的增加,标记数据的比例会逐渐减少。我们还证明了反概率加权(IPW)-Nadaraya-Watson 密度函数估计器的一致性。大量的模拟和对洛杉矶无家可归者数据的应用验证了我们方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Stat
Stat Decision Sciences-Statistics, Probability and Uncertainty
CiteScore
1.10
自引率
0.00%
发文量
85
期刊介绍: Stat is an innovative electronic journal for the rapid publication of novel and topical research results, publishing compact articles of the highest quality in all areas of statistical endeavour. Its purpose is to provide a means of rapid sharing of important new theoretical, methodological and applied research. Stat is a joint venture between the International Statistical Institute and Wiley-Blackwell. Stat is characterised by: • Speed - a high-quality review process that aims to reach a decision within 20 days of submission. • Concision - a maximum article length of 10 pages of text, not including references. • Supporting materials - inclusion of electronic supporting materials including graphs, video, software, data and images. • Scope - addresses all areas of statistics and interdisciplinary areas. Stat is a scientific journal for the international community of statisticians and researchers and practitioners in allied quantitative disciplines.
期刊最新文献
Communication‐Efficient Distributed Estimation of Causal Effects With High‐Dimensional Data A Joint Temporal Model for Hospitalizations and ICU Admissions Due to COVID‐19 in Quebec Bitcoin Price Prediction Using Deep Bayesian LSTM With Uncertainty Quantification: A Monte Carlo Dropout–Based Approach Exact interval estimation for three parameters subject to false positive misclassification Novel Closed‐Form Point Estimators for a Weighted Exponential Family Derived From Likelihood Equations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1