Effect of covariate shift on multi-class classification of Fermi-LAT sources

Dmitry V Malyshev
{"title":"Effect of covariate shift on multi-class classification of Fermi-LAT sources","authors":"Dmitry V Malyshev","doi":"10.1093/rasti/rzad053","DOIUrl":null,"url":null,"abstract":"Abstract Probabilistic classification of unassociated Fermi-LAT sources using machine learning methods has an implicit assumption that the distributions of associated and unassociated sources are the same as a function of source parameters, which is not the case for the Fermi-LAT catalogs. The problem of different distributions of training and testing (or target) datasets as a function of input features (covariates) is known as the covariate shift. In this paper, we, for the first time, quantitatively estimate the effect of the covariate shift on the multi-class classification of Fermi-LAT sources. We introduce sample weights proportional to the ratio of unassociated to associated source probability density functions so that associated sources in areas, which are densely populated with unassociated sources, have more weight than the sources in areas with few unassociated sources. We find that the covariate shift has relatively little effect on the predicted probabilities, i.e. the training can be performed either with weighted or with unweighted samples, which is generally expected for the covariate shift problems. The main effect of the covariate shift is on the estimated performance of the classification. Depending on the class, the covariate shift can lead up to 10 – 20% reduction in precision and recall compared to the estimates, where the covariate shift is not taken into account.","PeriodicalId":500957,"journal":{"name":"RAS Techniques and Instruments","volume":"132 37","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"RAS Techniques and Instruments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/rasti/rzad053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract Probabilistic classification of unassociated Fermi-LAT sources using machine learning methods has an implicit assumption that the distributions of associated and unassociated sources are the same as a function of source parameters, which is not the case for the Fermi-LAT catalogs. The problem of different distributions of training and testing (or target) datasets as a function of input features (covariates) is known as the covariate shift. In this paper, we, for the first time, quantitatively estimate the effect of the covariate shift on the multi-class classification of Fermi-LAT sources. We introduce sample weights proportional to the ratio of unassociated to associated source probability density functions so that associated sources in areas, which are densely populated with unassociated sources, have more weight than the sources in areas with few unassociated sources. We find that the covariate shift has relatively little effect on the predicted probabilities, i.e. the training can be performed either with weighted or with unweighted samples, which is generally expected for the covariate shift problems. The main effect of the covariate shift is on the estimated performance of the classification. Depending on the class, the covariate shift can lead up to 10 – 20% reduction in precision and recall compared to the estimates, where the covariate shift is not taken into account.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
协变量移位对费米- lat源多类分类的影响
使用机器学习方法对非关联费米- lat源进行概率分类有一个隐含的假设,即关联源和非关联源的分布与源参数的函数相同,这与费米- lat目录的情况不同。训练和测试(或目标)数据集的不同分布作为输入特征(协变量)的函数的问题被称为协变量移位。本文首次定量地估计了协变量位移对费米- lat源多类分类的影响。我们引入了与非关联源与关联源概率密度函数之比成比例的样本权重,使得非关联源密集区域中的关联源比非关联源较少区域中的关联源具有更大的权重。我们发现协变量移位对预测概率的影响相对较小,即可以使用加权或未加权的样本进行训练,这通常是协变量移位问题所期望的。协变量移位的主要影响是对分类的估计性能。根据类别的不同,与未考虑协变量移位的估计相比,协变量移位可能导致精度和召回率降低10 - 20%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The exosphere of Mars can be tracked by a high-spectral resolution telescope, such as the Line Emission Mapper Training a convolutional neural network for real-bogus classification in the ATLAS survey Exoplanet host star classification: Multi-Objective Optimisation of incomplete stellar abundance data PTFE as a viable sealing material for lightweight mass spectrometry ovens in dusty extraterrestrial environments PTFE as a viable sealing material for lightweight mass spectrometry ovens in dusty extraterrestrial environments
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1