Classification of real and bogus transients using active learning and semi-supervised learning

IF 5.4 2区 物理与天体物理 Q1 ASTRONOMY & ASTROPHYSICS Astronomy & Astrophysics Pub Date : 2025-01-09 DOI:10.1051/0004-6361/202348581
Yating Liu, Lulu Fan, Lei Hu, Junqiang Lu, Yan Lu, Zelin Xu, Jiazheng Zhu, Haochen Wang, Xu Kong
{"title":"Classification of real and bogus transients using active learning and semi-supervised learning","authors":"Yating Liu, Lulu Fan, Lei Hu, Junqiang Lu, Yan Lu, Zelin Xu, Jiazheng Zhu, Haochen Wang, Xu Kong","doi":"10.1051/0004-6361/202348581","DOIUrl":null,"url":null,"abstract":"<i>Context<i/>. The mounting data stream of large time-domain surveys renders the visual inspections of a huge set of transient candidates impractical. Techniques based on deep learning-based are popular solutions for minimizing human intervention in the time domain community. The classification of real and bogus transients is a fundamental component in real-time data processing systems and is critical to enabling rapid follow-up observations. Most existing methods (supervised learning) require sufficiently large training samples with corresponding labels, which involve costly human labeling and are challenging in the early stages of a time-domain survey. One method that can make use of training samples with access to only a limited amount of labels is highly desirable for future large time-domain surveys. These include the forthcoming 2.5-meter Wide-Field Survey Telescope (WFST) six-year survey and the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST).<i>Aims<i/>. Deep-learning-based methods have been favored in astrophysics owing to their adaptability and remarkable performance. They have been applied to the task of the classification of real and bogus transients. Unlike most existing approaches, which necessitate massive and expensive annotated data, we aim to leverage training samples with only 1000 labels and discover real sources that vary in brightness over time in the early stages of the WFST six-year survey.<i>Methods<i/>. We present a novel deep learning method that combines active learning and semi-supervised learning to construct a competitive real-bogus classifier. Our method incorporates an active learning stage, where we actively select the most informative or uncertain samples for annotation. This stage aims to achieve higher model performance by leveraging fewer labeled samples, thus reducing annotation costs and improving the overall learning process efficiency. Furthermore, our approach involves a semi-supervised learning stage that exploits the unlabeled data to enhance the model’s performance and achieve superior results, compared to using only the limited labeled data.<i>Results<i/>. Our proposed methodology capitalizes on the potential of active learning and semi-supervised learning. To demonstrate the efficacy of our approach, we constructed three newly compiled datasets from the Zwicky Transient Facility, achieving average accuracies of 98.8, 98.8, and 98.6% across these three datasets. It is important to note that our newly compiled datasets only work in terms of testing our deep learning methodology and there may be a potential bias between our datasets and the complete data stream. Therefore, the observed performance on these datasets cannot be assumed to directly translate to the general alert stream for general transient detection in actual scenarios. The algorithm will be integrated into the WFST pipeline, enabling an efficient and effective classification of transients in the early period of a time-domain survey.","PeriodicalId":8571,"journal":{"name":"Astronomy & Astrophysics","volume":"13 1","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Astronomy & Astrophysics","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1051/0004-6361/202348581","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ASTRONOMY & ASTROPHYSICS","Score":null,"Total":0}
引用次数: 0

Abstract

Context. The mounting data stream of large time-domain surveys renders the visual inspections of a huge set of transient candidates impractical. Techniques based on deep learning-based are popular solutions for minimizing human intervention in the time domain community. The classification of real and bogus transients is a fundamental component in real-time data processing systems and is critical to enabling rapid follow-up observations. Most existing methods (supervised learning) require sufficiently large training samples with corresponding labels, which involve costly human labeling and are challenging in the early stages of a time-domain survey. One method that can make use of training samples with access to only a limited amount of labels is highly desirable for future large time-domain surveys. These include the forthcoming 2.5-meter Wide-Field Survey Telescope (WFST) six-year survey and the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST).Aims. Deep-learning-based methods have been favored in astrophysics owing to their adaptability and remarkable performance. They have been applied to the task of the classification of real and bogus transients. Unlike most existing approaches, which necessitate massive and expensive annotated data, we aim to leverage training samples with only 1000 labels and discover real sources that vary in brightness over time in the early stages of the WFST six-year survey.Methods. We present a novel deep learning method that combines active learning and semi-supervised learning to construct a competitive real-bogus classifier. Our method incorporates an active learning stage, where we actively select the most informative or uncertain samples for annotation. This stage aims to achieve higher model performance by leveraging fewer labeled samples, thus reducing annotation costs and improving the overall learning process efficiency. Furthermore, our approach involves a semi-supervised learning stage that exploits the unlabeled data to enhance the model’s performance and achieve superior results, compared to using only the limited labeled data.Results. Our proposed methodology capitalizes on the potential of active learning and semi-supervised learning. To demonstrate the efficacy of our approach, we constructed three newly compiled datasets from the Zwicky Transient Facility, achieving average accuracies of 98.8, 98.8, and 98.6% across these three datasets. It is important to note that our newly compiled datasets only work in terms of testing our deep learning methodology and there may be a potential bias between our datasets and the complete data stream. Therefore, the observed performance on these datasets cannot be assumed to directly translate to the general alert stream for general transient detection in actual scenarios. The algorithm will be integrated into the WFST pipeline, enabling an efficient and effective classification of transients in the early period of a time-domain survey.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
Astronomy & Astrophysics
Astronomy & Astrophysics 地学天文-天文与天体物理
CiteScore
10.20
自引率
27.70%
发文量
2105
审稿时长
1-2 weeks
期刊介绍: Astronomy & Astrophysics is an international Journal that publishes papers on all aspects of astronomy and astrophysics (theoretical, observational, and instrumental) independently of the techniques used to obtain the results.
期刊最新文献
MAGIS (Measuring Abundances of red super Giants with Infrared Spectroscopy) project Discovery of a cold giant planet and mass measurement of a hot super-Earth in the multi-planetary system WASP-132 Physical properties of newly active asteroid 2010 LH15 Inelastic H + H3+ collision rates and their impact on the determination of the excitation temperature of H3+ Asteroid detection polar equation calculation and graphical representation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1