Machine learning technique for morphological classification of galaxies from the SDSS. III. The CNN image-based inference of detailed features

Pub Date : 2022-09-25 DOI:10.15407/knit2022.05.027
V. Khramtsov, I. Vavilova, D. Dobrycheva, M. Vasylenko, O. V. Melnyk, A. Elyiv, V. Akhmetov, A. Dmytrenko
{"title":"Machine learning technique for morphological classification of galaxies from the SDSS. III. The CNN image-based inference of detailed features","authors":"V. Khramtsov, I. Vavilova, D. Dobrycheva, M. Vasylenko, O. V. Melnyk, A. Elyiv, V. Akhmetov, A. Dmytrenko","doi":"10.15407/knit2022.05.027","DOIUrl":null,"url":null,"abstract":"This paper follows a series of our works on the applicability of various machine learning methods to morphological galaxy classification (Vavilova et al., 2021, 2022). We exploited the sample of ~315800 low-redshift SDSS DR9 galaxies with absolute stellar magnitudes of −24m < Mr < −19.4m at 0.003 < z < 0.1 redshifts as a target data set for the CNN classifier. Because it is tightly overlapped with the Galaxy Zoo 2 (GZ2) sample, we use these annotated data as the training data set to classify galaxies into 34 detailed features. In the presence of a pronounced difference in visual parameters between galaxies from the GZ2 training data set and galaxies without known morphological parameters, we applied novel procedures, which allowed us for the first time to get rid of this difference for smaller and fainter SDSS galaxies with mr < 17.7. We describe in detail the adversarial validation technique as well as how we managed the optimal train-test split of galaxies from the training data set to verify our CNN model based on the DenseNet-201 realistically. We have also found optimal galaxy image transformations, which help increase the classifier’s generalization ability. We demonstrate for the first time that implication of the CNN model with a train-test split of data sets and size-changing function simulating a decrease in magnitude and size (data augmentation) significantly improves the classification of smaller and fainter SDSS galaxies. It can be considered as another way to improve the human bias for those galaxy images that had a poor vote classification in the GZ project. Such an approach, like autoimmunization, when the CNN classifier, trained on very good galaxy images, is able to retrain bad images from the same homogeneous sample, can be considered co-planar to other methods of combating such a human bias. The most promising result is related to the CNN prediction probability in the classification of detailed features. The accuracy of the CNN classifier is in the range of 83.3—99.4 % depending on 32 features (exception is for “disturbed” (68.55 %) and “arms winding medium” (77.39 %) features). As a result, for the first time, we assigned the detailed morphological classification for more than 140000 low-redshift galaxies, especially at the fainter end. A visual inspection of the samples of galaxies with certain morphological features allowed us to reveal typical problem points of galaxy image classification by shape and features from the astronomical point of view. The morphological catalogs of low-redshift SDSS galaxies with the most interesting features are available through the UkrVO website (http://ukr-vo.org/galaxies/) and VizieR.","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15407/knit2022.05.027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

This paper follows a series of our works on the applicability of various machine learning methods to morphological galaxy classification (Vavilova et al., 2021, 2022). We exploited the sample of ~315800 low-redshift SDSS DR9 galaxies with absolute stellar magnitudes of −24m < Mr < −19.4m at 0.003 < z < 0.1 redshifts as a target data set for the CNN classifier. Because it is tightly overlapped with the Galaxy Zoo 2 (GZ2) sample, we use these annotated data as the training data set to classify galaxies into 34 detailed features. In the presence of a pronounced difference in visual parameters between galaxies from the GZ2 training data set and galaxies without known morphological parameters, we applied novel procedures, which allowed us for the first time to get rid of this difference for smaller and fainter SDSS galaxies with mr < 17.7. We describe in detail the adversarial validation technique as well as how we managed the optimal train-test split of galaxies from the training data set to verify our CNN model based on the DenseNet-201 realistically. We have also found optimal galaxy image transformations, which help increase the classifier’s generalization ability. We demonstrate for the first time that implication of the CNN model with a train-test split of data sets and size-changing function simulating a decrease in magnitude and size (data augmentation) significantly improves the classification of smaller and fainter SDSS galaxies. It can be considered as another way to improve the human bias for those galaxy images that had a poor vote classification in the GZ project. Such an approach, like autoimmunization, when the CNN classifier, trained on very good galaxy images, is able to retrain bad images from the same homogeneous sample, can be considered co-planar to other methods of combating such a human bias. The most promising result is related to the CNN prediction probability in the classification of detailed features. The accuracy of the CNN classifier is in the range of 83.3—99.4 % depending on 32 features (exception is for “disturbed” (68.55 %) and “arms winding medium” (77.39 %) features). As a result, for the first time, we assigned the detailed morphological classification for more than 140000 low-redshift galaxies, especially at the fainter end. A visual inspection of the samples of galaxies with certain morphological features allowed us to reveal typical problem points of galaxy image classification by shape and features from the astronomical point of view. The morphological catalogs of low-redshift SDSS galaxies with the most interesting features are available through the UkrVO website (http://ukr-vo.org/galaxies/) and VizieR.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
SDSS星系形态分类的机器学习技术。III、 基于CNN图像的细节特征推理
本文介绍了我们关于各种机器学习方法在形态星系分类中的适用性的一系列工作(Vavilova et al.,2022022)。我们利用了约315800个低红移SDSS DR9星系的样本作为CNN分类器的目标数据集,这些星系的绝对恒星星等为−24m
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1