Automatic Spectral Classification of Stars using Machine Learning: An Approach based on the use of Unbalanced Data

Marco Oyarzo Huichaqueo, Renato Andrés Muñoz Orrego
{"title":"Automatic Spectral Classification of Stars using Machine Learning: An Approach based on the use of Unbalanced Data","authors":"Marco Oyarzo Huichaqueo, Renato Andrés Muñoz Orrego","doi":"10.5121/mlaij.2022.9401","DOIUrl":null,"url":null,"abstract":"With the increase in astronomical surveys, astronomers are faced with the challenging task of analyzing a large amount of data in order to classify observed objects into hard-to-distinguish classes. This article presents a machine learning-based method for the automatic spectral classification of stars from the latest release of the SDSS database. We propose the combinatorial use of spectral data, derived stellar data, and calculated data to create patterns. Using these patterns as inputs, we develop a Random Forest model that outputs the spectral class of the observed star. Our model is able to classify data into six complex classes: A, F, G, K, M, and Carbon stars. Due to the unbalanced nature of the data, we train our model considering three data use cases: using the original data, using under-sampling, and over-sampling data techniques. We further test our model by using a fixed dataset and a stratified dataset. From this, we analyze the performance of our model through statistical metrics. The experimental results showed that the combinatorial use of data as an input pattern contributes to improve the prediction scores in all data use cases, meanwhile, the model trained with augmented data outperforms the other cases. Our results suggest that machine learning-based spectral classification of stars may be useful for astronomers.","PeriodicalId":347528,"journal":{"name":"Machine Learning and Applications: An International Journal","volume":"26 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning and Applications: An International Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/mlaij.2022.9401","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

With the increase in astronomical surveys, astronomers are faced with the challenging task of analyzing a large amount of data in order to classify observed objects into hard-to-distinguish classes. This article presents a machine learning-based method for the automatic spectral classification of stars from the latest release of the SDSS database. We propose the combinatorial use of spectral data, derived stellar data, and calculated data to create patterns. Using these patterns as inputs, we develop a Random Forest model that outputs the spectral class of the observed star. Our model is able to classify data into six complex classes: A, F, G, K, M, and Carbon stars. Due to the unbalanced nature of the data, we train our model considering three data use cases: using the original data, using under-sampling, and over-sampling data techniques. We further test our model by using a fixed dataset and a stratified dataset. From this, we analyze the performance of our model through statistical metrics. The experimental results showed that the combinatorial use of data as an input pattern contributes to improve the prediction scores in all data use cases, meanwhile, the model trained with augmented data outperforms the other cases. Our results suggest that machine learning-based spectral classification of stars may be useful for astronomers.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于机器学习的恒星光谱自动分类:一种基于非平衡数据的方法
随着天文调查的增加,天文学家面临着分析大量数据以将观测到的天体划分为难以区分的类别的艰巨任务。本文提出了一种基于机器学习的SDSS数据库中恒星光谱自动分类方法。我们建议组合使用光谱数据、导出的恒星数据和计算数据来创建模式。使用这些模式作为输入,我们开发了一个随机森林模型,输出被观测恒星的光谱类别。我们的模型能够将数据分为六个复杂的类别:A、F、G、K、M和碳星。由于数据的不平衡性质,我们考虑了三种数据用例来训练我们的模型:使用原始数据,使用欠采样和过度采样数据技术。我们通过使用固定数据集和分层数据集进一步测试我们的模型。在此基础上,我们通过统计度量来分析模型的性能。实验结果表明,组合使用数据作为输入模式有助于提高所有数据用例的预测分数,同时,用增强数据训练的模型优于其他用例。我们的研究结果表明,基于机器学习的恒星光谱分类可能对天文学家有用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Machine Learning Method for Prediction of Yogurt Quality and Consumers Preferencesusing Sensory Attributes and Image Processing Techniques Automatic Spectral Classification of Stars using Machine Learning: An Approach based on the use of Unbalanced Data Ai_Birder: Using Artificial Intelligence and Deep Learning to Create a Mobile Application that Automates Bird Classification DSAGLSTM-DTA: Prediction of Drug-Target Affinity using Dual Self-Attention and LSTM Multilingual Speech to Text using Deep Learning based on MFCC Features
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1