Automatic Spectral Classification of Stars using Machine Learning: An Approach based on the use of Unbalanced Data

Machine Learning and Applications: An International Journal Pub Date : 2022-12-30 DOI:10.5121/mlaij.2022.9401

Marco Oyarzo Huichaqueo, Renato Andrés Muñoz Orrego

{"title":"Automatic Spectral Classification of Stars using Machine Learning: An Approach based on the use of Unbalanced Data","authors":"Marco Oyarzo Huichaqueo, Renato Andrés Muñoz Orrego","doi":"10.5121/mlaij.2022.9401","DOIUrl":null,"url":null,"abstract":"With the increase in astronomical surveys, astronomers are faced with the challenging task of analyzing a large amount of data in order to classify observed objects into hard-to-distinguish classes. This article presents a machine learning-based method for the automatic spectral classification of stars from the latest release of the SDSS database. We propose the combinatorial use of spectral data, derived stellar data, and calculated data to create patterns. Using these patterns as inputs, we develop a Random Forest model that outputs the spectral class of the observed star. Our model is able to classify data into six complex classes: A, F, G, K, M, and Carbon stars. Due to the unbalanced nature of the data, we train our model considering three data use cases: using the original data, using under-sampling, and over-sampling data techniques. We further test our model by using a fixed dataset and a stratified dataset. From this, we analyze the performance of our model through statistical metrics. The experimental results showed that the combinatorial use of data as an input pattern contributes to improve the prediction scores in all data use cases, meanwhile, the model trained with augmented data outperforms the other cases. Our results suggest that machine learning-based spectral classification of stars may be useful for astronomers.","PeriodicalId":347528,"journal":{"name":"Machine Learning and Applications: An International Journal","volume":"26 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning and Applications: An International Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/mlaij.2022.9401","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

With the increase in astronomical surveys, astronomers are faced with the challenging task of analyzing a large amount of data in order to classify observed objects into hard-to-distinguish classes. This article presents a machine learning-based method for the automatic spectral classification of stars from the latest release of the SDSS database. We propose the combinatorial use of spectral data, derived stellar data, and calculated data to create patterns. Using these patterns as inputs, we develop a Random Forest model that outputs the spectral class of the observed star. Our model is able to classify data into six complex classes: A, F, G, K, M, and Carbon stars. Due to the unbalanced nature of the data, we train our model considering three data use cases: using the original data, using under-sampling, and over-sampling data techniques. We further test our model by using a fixed dataset and a stratified dataset. From this, we analyze the performance of our model through statistical metrics. The experimental results showed that the combinatorial use of data as an input pattern contributes to improve the prediction scores in all data use cases, meanwhile, the model trained with augmented data outperforms the other cases. Our results suggest that machine learning-based spectral classification of stars may be useful for astronomers.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于机器学习的恒星光谱自动分类:一种基于非平衡数据的方法

随着天文调查的增加，天文学家面临着分析大量数据以将观测到的天体划分为难以区分的类别的艰巨任务。本文提出了一种基于机器学习的SDSS数据库中恒星光谱自动分类方法。我们建议组合使用光谱数据、导出的恒星数据和计算数据来创建模式。使用这些模式作为输入，我们开发了一个随机森林模型，输出被观测恒星的光谱类别。我们的模型能够将数据分为六个复杂的类别:A、F、G、K、M和碳星。由于数据的不平衡性质，我们考虑了三种数据用例来训练我们的模型:使用原始数据，使用欠采样和过度采样数据技术。我们通过使用固定数据集和分层数据集进一步测试我们的模型。在此基础上，我们通过统计度量来分析模型的性能。实验结果表明，组合使用数据作为输入模式有助于提高所有数据用例的预测分数，同时，用增强数据训练的模型优于其他用例。我们的研究结果表明，基于机器学习的恒星光谱分类可能对天文学家有用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Machine Learning and Applications: An International Journal

自引率

0.00%

发文量