Fighting selection bias in statistical learning: application to visual recognition from biased image databases

IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Journal of Nonparametric Statistics Pub Date : 2023-09-19 DOI:10.1080/10485252.2023.2259011
Stephan Clémençon, Pierre Laforgue, Robin Vogel
{"title":"Fighting selection bias in statistical learning: application to visual recognition from biased image databases","authors":"Stephan Clémençon, Pierre Laforgue, Robin Vogel","doi":"10.1080/10485252.2023.2259011","DOIUrl":null,"url":null,"abstract":"AbstractIn practice, and especially when training deep neural networks, visual recognition rules are often learned based on various sources of information. On the other hand, the recent deployment of facial recognition systems with uneven performances on different population segments has highlighted the representativeness issues induced by a naive aggregation of the datasets. In this paper, we show how biasing models can remedy these problems. Based on the (approximate) knowledge of the biasing mechanisms at work, our approach consists in reweighting the observations, so as to form a nearly debiased estimator of the target distribution. One key condition is that the supports of the biased distributions must partly overlap, and cover the support of the target distribution. In order to meet this requirement in practice, we propose to use a low dimensional image representation, shared across the image databases. Finally, we provide numerical experiments highlighting the relevance of our approach.Keywords: Sampling biasselection effectvisual recognitionreliable statistical learning Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis work was partially supported by the research chair ‘Good In Tech : Rethinking innovation and technology as drivers of a better world for and by humans’, under the auspices of the ‘Fondation du Risque’ and in partnership with the Institut Mines-Télécom, Sciences Po, Afnor, Ag2r La Mondiale, CGI France, Danone and Sycomore.","PeriodicalId":50112,"journal":{"name":"Journal of Nonparametric Statistics","volume":"181 1","pages":"0"},"PeriodicalIF":0.8000,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Nonparametric Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/10485252.2023.2259011","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

AbstractIn practice, and especially when training deep neural networks, visual recognition rules are often learned based on various sources of information. On the other hand, the recent deployment of facial recognition systems with uneven performances on different population segments has highlighted the representativeness issues induced by a naive aggregation of the datasets. In this paper, we show how biasing models can remedy these problems. Based on the (approximate) knowledge of the biasing mechanisms at work, our approach consists in reweighting the observations, so as to form a nearly debiased estimator of the target distribution. One key condition is that the supports of the biased distributions must partly overlap, and cover the support of the target distribution. In order to meet this requirement in practice, we propose to use a low dimensional image representation, shared across the image databases. Finally, we provide numerical experiments highlighting the relevance of our approach.Keywords: Sampling biasselection effectvisual recognitionreliable statistical learning Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis work was partially supported by the research chair ‘Good In Tech : Rethinking innovation and technology as drivers of a better world for and by humans’, under the auspices of the ‘Fondation du Risque’ and in partnership with the Institut Mines-Télécom, Sciences Po, Afnor, Ag2r La Mondiale, CGI France, Danone and Sycomore.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
对抗统计学习中的选择偏差:应用于有偏差图像数据库的视觉识别
在实践中,特别是在训练深度神经网络时,视觉识别规则通常是基于各种信息来源学习的。另一方面,最近部署的面部识别系统在不同人群中表现不均匀,突出了数据集幼稚聚合引起的代表性问题。在本文中,我们展示了偏置模型如何解决这些问题。基于对工作中的偏倚机制的(近似)了解,我们的方法包括重新加权观测值,从而形成目标分布的近去偏估计量。一个关键条件是有偏分布的支持必须部分重叠,并覆盖目标分布的支持。为了在实践中满足这一要求,我们建议使用低维图像表示,在图像数据库中共享。最后,我们提供了数值实验,突出了我们方法的相关性。关键词:抽样偏倚选择效应视觉识别可靠统计学习披露声明作者未报告潜在的利益冲突。这项工作得到了“科技的好处:重新思考创新和技术作为人类和人类更美好世界的驱动力”研究主席的部分支持,该研究主席由“Risque基金会”主持,并与矿业研究所、巴黎政治学院、Afnor、Ag2r La Mondiale、CGI法国、达能和Sycomore合作。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Nonparametric Statistics
Journal of Nonparametric Statistics 数学-统计学与概率论
CiteScore
1.50
自引率
8.30%
发文量
42
审稿时长
6-12 weeks
期刊介绍: Journal of Nonparametric Statistics provides a medium for the publication of research and survey work in nonparametric statistics and related areas. The scope includes, but is not limited to the following topics: Nonparametric modeling, Nonparametric function estimation, Rank and other robust and distribution-free procedures, Resampling methods, Lack-of-fit testing, Multivariate analysis, Inference with high-dimensional data, Dimension reduction and variable selection, Methods for errors in variables, missing, censored, and other incomplete data structures, Inference of stochastic processes, Sample surveys, Time series analysis, Longitudinal and functional data analysis, Nonparametric Bayes methods and decision procedures, Semiparametric models and procedures, Statistical methods for imaging and tomography, Statistical inverse problems, Financial statistics and econometrics, Bioinformatics and comparative genomics, Statistical algorithms and machine learning. Both the theory and applications of nonparametric statistics are covered in the journal. Research applying nonparametric methods to medicine, engineering, technology, science and humanities is welcomed, provided the novelty and quality level are of the highest order. Authors are encouraged to submit supplementary technical arguments, computer code, data analysed in the paper or any additional information for online publication along with the published paper.
期刊最新文献
Adaptive and efficient isotonic estimation in Wicksell's problem A general semi-parametric elliptical distribution model for semi-supervised learning Stone's theorem for distributional regression in Wasserstein distance Kernel density estimation for a stochastic process with values in a Riemannian manifold Functional index coefficient models for locally stationary time series
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1