T.A. Semenikhin , M.V. Kornilov , M.V. Pruzhinskaya , A.D. Lavrukhina , E. Russeil , E. Gangler , E.E.O. Ishida , V.S. Korolev , K.L. Malanchev , A.A. Volnova , S. Sreejith , SNAD team
{"title":"Real-bogus scores for active anomaly detection","authors":"T.A. Semenikhin , M.V. Kornilov , M.V. Pruzhinskaya , A.D. Lavrukhina , E. Russeil , E. Gangler , E.E.O. Ishida , V.S. Korolev , K.L. Malanchev , A.A. Volnova , S. Sreejith , SNAD team","doi":"10.1016/j.ascom.2024.100919","DOIUrl":null,"url":null,"abstract":"<div><div>In the task of anomaly detection in modern time-domain photometric surveys, the primary goal is to identify astrophysically interesting, rare, and unusual objects among a large volume of data. Unfortunately, artifacts — such as plane or satellite tracks, bad columns on CCDs, and ghosts — often constitute significant contaminants in results from anomaly detection analysis. In such contexts, the Active Anomaly Discovery (AAD) algorithm allows tailoring the output of anomaly detection pipelines according to what the expert judges to be scientifically interesting. We demonstrate how the introduction real-bogus scores, obtained from a machine learning classifier, improves the results from AAD. Using labeled data from the SNAD ZTF knowledge database, we train four real-bogus classifiers: XGBoost, CatBoost, Random Forest, and Extremely Randomized Trees. All the models perform real-bogus classification with similar effectiveness, achieving ROC-AUC scores ranging from 0.93 to 0.95. Consequently, we select the Random Forest model as the main model due to its simplicity and interpretability. The Random Forest classifier is applied to 67 million light curves from ZTF DR17. The output real-bogus score is used as an additional feature for two anomaly detection algorithms: static Isolation Forest and AAD. The number of artifacts detected by both algorithms decreases significantly with the inclusion of the real-bogus score in cases where the feature space regions are densely populated with artifacts. However, it remains almost unchanged in scenarios where the overall number of artifacts in the outputs is already small. We conclude that incorporating the real-bogus classifier result as an additional feature in the active anomaly detection pipeline reduces the number of artifacts in the outputs, thereby increasing the incidence of astrophysically interesting objects presented to human experts.</div></div>","PeriodicalId":48757,"journal":{"name":"Astronomy and Computing","volume":"51 ","pages":"Article 100919"},"PeriodicalIF":1.9000,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Astronomy and Computing","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2213133724001343","RegionNum":4,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ASTRONOMY & ASTROPHYSICS","Score":null,"Total":0}
引用次数: 0
Abstract
In the task of anomaly detection in modern time-domain photometric surveys, the primary goal is to identify astrophysically interesting, rare, and unusual objects among a large volume of data. Unfortunately, artifacts — such as plane or satellite tracks, bad columns on CCDs, and ghosts — often constitute significant contaminants in results from anomaly detection analysis. In such contexts, the Active Anomaly Discovery (AAD) algorithm allows tailoring the output of anomaly detection pipelines according to what the expert judges to be scientifically interesting. We demonstrate how the introduction real-bogus scores, obtained from a machine learning classifier, improves the results from AAD. Using labeled data from the SNAD ZTF knowledge database, we train four real-bogus classifiers: XGBoost, CatBoost, Random Forest, and Extremely Randomized Trees. All the models perform real-bogus classification with similar effectiveness, achieving ROC-AUC scores ranging from 0.93 to 0.95. Consequently, we select the Random Forest model as the main model due to its simplicity and interpretability. The Random Forest classifier is applied to 67 million light curves from ZTF DR17. The output real-bogus score is used as an additional feature for two anomaly detection algorithms: static Isolation Forest and AAD. The number of artifacts detected by both algorithms decreases significantly with the inclusion of the real-bogus score in cases where the feature space regions are densely populated with artifacts. However, it remains almost unchanged in scenarios where the overall number of artifacts in the outputs is already small. We conclude that incorporating the real-bogus classifier result as an additional feature in the active anomaly detection pipeline reduces the number of artifacts in the outputs, thereby increasing the incidence of astrophysically interesting objects presented to human experts.
Astronomy and ComputingASTRONOMY & ASTROPHYSICSCOMPUTER SCIENCE,-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
CiteScore
4.10
自引率
8.00%
发文量
67
期刊介绍:
Astronomy and Computing is a peer-reviewed journal that focuses on the broad area between astronomy, computer science and information technology. The journal aims to publish the work of scientists and (software) engineers in all aspects of astronomical computing, including the collection, analysis, reduction, visualisation, preservation and dissemination of data, and the development of astronomical software and simulations. The journal covers applications for academic computer science techniques to astronomy, as well as novel applications of information technologies within astronomy.