Real-bogus scores for active anomaly detection

T. A. SemenikhinThe SNAD team, M. V. KornilovThe SNAD team, M. V. PruzhinskayaThe SNAD team, A. D. LavrukhinaThe SNAD team, E. RusseilThe SNAD team, E. GanglerThe SNAD team, E. E. O. IshidaThe SNAD team, V. S. KorolevThe SNAD team, K. L. MalanchevThe SNAD team, A. A. VolnovaThe SNAD team, S. SreejithThe SNAD team
{"title":"Real-bogus scores for active anomaly detection","authors":"T. A. SemenikhinThe SNAD team, M. V. KornilovThe SNAD team, M. V. PruzhinskayaThe SNAD team, A. D. LavrukhinaThe SNAD team, E. RusseilThe SNAD team, E. GanglerThe SNAD team, E. E. O. IshidaThe SNAD team, V. S. KorolevThe SNAD team, K. L. MalanchevThe SNAD team, A. A. VolnovaThe SNAD team, S. SreejithThe SNAD team","doi":"arxiv-2409.10256","DOIUrl":null,"url":null,"abstract":"In the task of anomaly detection in modern time-domain photometric surveys,\nthe primary goal is to identify astrophysically interesting, rare, and unusual\nobjects among a large volume of data. Unfortunately, artifacts -- such as plane\nor satellite tracks, bad columns on CCDs, and ghosts -- often constitute\nsignificant contaminants in results from anomaly detection analysis. In such\ncontexts, the Active Anomaly Discovery (AAD) algorithm allows tailoring the\noutput of anomaly detection pipelines according to what the expert judges to be\nscientifically interesting. We demonstrate how the introduction real-bogus\nscores, obtained from a machine learning classifier, improves the results from\nAAD. Using labeled data from the SNAD ZTF knowledge database, we train four\nreal-bogus classifiers: XGBoost, CatBoost, Random Forest, and Extremely\nRandomized Trees. All the models perform real-bogus classification with similar\neffectiveness, achieving ROC-AUC scores ranging from 0.93 to 0.95.\nConsequently, we select the Random Forest model as the main model due to its\nsimplicity and interpretability. The Random Forest classifier is applied to 67\nmillion light curves from ZTF DR17. The output real-bogus score is used as an\nadditional feature for two anomaly detection algorithms: static Isolation\nForest and AAD. While results from Isolation Forest remained unchanged, the\nnumber of artifacts detected by the active approach decreases significantly\nwith the inclusion of the real-bogus score, from 27 to 3 out of 100. We\nconclude that incorporating the real-bogus classifier result as an additional\nfeature in the active anomaly detection pipeline significantly reduces the\nnumber of artifacts in the outputs, thereby increasing the incidence of\nastrophysically interesting objects presented to human experts.","PeriodicalId":501163,"journal":{"name":"arXiv - PHYS - Instrumentation and Methods for Astrophysics","volume":"11 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Instrumentation and Methods for Astrophysics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10256","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In the task of anomaly detection in modern time-domain photometric surveys, the primary goal is to identify astrophysically interesting, rare, and unusual objects among a large volume of data. Unfortunately, artifacts -- such as plane or satellite tracks, bad columns on CCDs, and ghosts -- often constitute significant contaminants in results from anomaly detection analysis. In such contexts, the Active Anomaly Discovery (AAD) algorithm allows tailoring the output of anomaly detection pipelines according to what the expert judges to be scientifically interesting. We demonstrate how the introduction real-bogus scores, obtained from a machine learning classifier, improves the results from AAD. Using labeled data from the SNAD ZTF knowledge database, we train four real-bogus classifiers: XGBoost, CatBoost, Random Forest, and Extremely Randomized Trees. All the models perform real-bogus classification with similar effectiveness, achieving ROC-AUC scores ranging from 0.93 to 0.95. Consequently, we select the Random Forest model as the main model due to its simplicity and interpretability. The Random Forest classifier is applied to 67 million light curves from ZTF DR17. The output real-bogus score is used as an additional feature for two anomaly detection algorithms: static Isolation Forest and AAD. While results from Isolation Forest remained unchanged, the number of artifacts detected by the active approach decreases significantly with the inclusion of the real-bogus score, from 27 to 3 out of 100. We conclude that incorporating the real-bogus classifier result as an additional feature in the active anomaly detection pipeline significantly reduces the number of artifacts in the outputs, thereby increasing the incidence of astrophysically interesting objects presented to human experts.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于主动异常检测的真实迷宫分数
在现代时域光度测量中的异常检测任务中,主要目标是在大量数据中识别出天体物理学上有趣的、罕见的和不寻常的天体。遗憾的是,在异常检测分析的结果中,诸如平面或卫星轨迹、CCD 上的坏柱和鬼影等人工痕迹往往会构成重要的污染物。在这种情况下,主动异常发现(AAD)算法可以根据专家判断出的科学趣味来调整异常检测管道的输出。我们展示了从机器学习分类器中获得的真实误码率是如何改进 AAD 结果的。利用 SNAD ZTF 知识数据库中的标注数据,我们训练了四个真实误差分类器:XGBoost、CatBoost、Random Forest 和 ExtremelyRandomized Trees。由于随机森林模型简单易懂,我们选择了它作为主要模型。随机森林分类器适用于来自 ZTF DR17 的 6700 万条光变曲线。输出的真实误差得分被用作两种异常检测算法的附加特征:静态 Isolation Forest 和 AAD。虽然 Isolation Forest 算法的结果保持不变,但主动方法检测到的伪影数量却在加入真实迷宫得分后显著减少,从 27 个减少到 3 个(满分 100 分)。我们的结论是,将真实迷宫分类器结果作为附加功能纳入主动异常检测管道,可显著减少输出中的伪影数量,从而提高向人类专家展示的物理上有趣的物体的发生率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Bright unintended electromagnetic radiation from second-generation Starlink satellites Likelihood reconstruction of radio signals of neutrinos and cosmic rays An evaluation of source-blending impact on the calibration of SKA EoR experiments WALLABY Pilot Survey: HI source-finding with a machine learning framework Black Hole Accretion is all about Sub-Keplerian Flows
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1