Classification of Continuous Sky Brightness Data Using Random Forest

IF 1.6 4区物理与天体物理 Q3 ASTRONOMY & ASTROPHYSICS Advances in Astronomy Pub Date : 2020-04-01 DOI:10.1155/2020/5102065

R. Priyatikanto, Lidia Mayangsari, Rudi A. Prihandoko, A. Admiranto

{"title":"Classification of Continuous Sky Brightness Data Using Random Forest","authors":"R. Priyatikanto, Lidia Mayangsari, Rudi A. Prihandoko, A. Admiranto","doi":"10.1155/2020/5102065","DOIUrl":null,"url":null,"abstract":"Sky brightness measuring and monitoring are required to mitigate the negative effect of light pollution as a byproduct of modern civilization. Good handling of a pile of sky brightness data includes evaluation and classification of the data according to its quality and characteristics such that further analysis and inference can be conducted properly. This study aims to develop a classification model based on Random Forest algorithm and to evaluate its performance. Using sky brightness data from 1250 nights with minute temporal resolution acquired at eight different stations in Indonesia, datasets consisting of 15 features were created to train and test the model. Those features were extracted from the observation time, the global statistics of nightly sky brightness, or the light curve characteristics. Among those features, 10 are considered to be the most important for the classification task. The model was trained to classify the data into six classes (1: peculiar data, 2: overcast, 3: cloudy, 4: clear, 5: moonlit-cloudy, and 6: moonlit-clear) and then tested to achieve high accuracy (92%) and scores (F-score = 84% and G-mean = 84%). Some misclassifications exist, but the classification results are considerably good as indicated by posterior distributions of the sky brightness as a function of classes. Data classified as class-4 have sharp distribution with typical full width at half maximum of 1.5 mag/arcsec2, while distributions of class-2 and -3 are left skewed with the latter having lighter tail. Due to the moonlight, distributions of class-5 and -6 data are more smeared or have larger spread. These results demonstrate that the established classification model is reasonably good and consistent.","PeriodicalId":48962,"journal":{"name":"Advances in Astronomy","volume":"2020 1","pages":"5102065"},"PeriodicalIF":1.6000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2020/5102065","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Astronomy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1155/2020/5102065","RegionNum":4,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ASTRONOMY & ASTROPHYSICS","Score":null,"Total":0}

引用次数: 3

Abstract

Sky brightness measuring and monitoring are required to mitigate the negative effect of light pollution as a byproduct of modern civilization. Good handling of a pile of sky brightness data includes evaluation and classification of the data according to its quality and characteristics such that further analysis and inference can be conducted properly. This study aims to develop a classification model based on Random Forest algorithm and to evaluate its performance. Using sky brightness data from 1250 nights with minute temporal resolution acquired at eight different stations in Indonesia, datasets consisting of 15 features were created to train and test the model. Those features were extracted from the observation time, the global statistics of nightly sky brightness, or the light curve characteristics. Among those features, 10 are considered to be the most important for the classification task. The model was trained to classify the data into six classes (1: peculiar data, 2: overcast, 3: cloudy, 4: clear, 5: moonlit-cloudy, and 6: moonlit-clear) and then tested to achieve high accuracy (92%) and scores (F-score = 84% and G-mean = 84%). Some misclassifications exist, but the classification results are considerably good as indicated by posterior distributions of the sky brightness as a function of classes. Data classified as class-4 have sharp distribution with typical full width at half maximum of 1.5 mag/arcsec2, while distributions of class-2 and -3 are left skewed with the latter having lighter tail. Due to the moonlight, distributions of class-5 and -6 data are more smeared or have larger spread. These results demonstrate that the established classification model is reasonably good and consistent.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用随机森林对连续天空亮度数据进行分类

需要测量和监测天空亮度，以减轻作为现代文明副产品的光污染的负面影响。对一堆天空亮度数据的良好处理包括根据数据的质量和特性对数据进行评估和分类，以便正确地进行进一步的分析和推断。本研究旨在开发一个基于随机森林算法的分类模型，并评估其性能。使用在印度尼西亚八个不同站点获得的1250个夜晚的微小时间分辨率的天空亮度数据，创建了由15个特征组成的数据集来训练和测试模型。这些特征是从观测时间、夜间天空亮度的全局统计数据或光线曲线特征中提取的。在这些特征中，10个被认为是分类任务中最重要的。该模型被训练为将数据分为六类（1：特殊数据，2：阴天，3：多云，4：晴朗，5：月下多云，6：月下晴朗），然后进行测试以实现高精度（92%）和得分（F得分 = 84%和G-均值 = 84%）。存在一些错误分类，但分类结果相当好，正如天空亮度作为类别函数的后验分布所表明的那样。分类为4类的数据具有尖锐的分布，典型的全宽为1.5 mag/arcsec2，而2类和-3类的分布是左偏的，后者具有较轻的尾部。由于月光的影响，第5类和第6类数据的分布更加模糊或具有更大的扩散。这些结果表明，所建立的分类模型是合理的良好和一致的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Advances in Astronomy ASTRONOMY & ASTROPHYSICS-

CiteScore

2.70

自引率

7.10%

发文量

审稿时长

22 weeks

期刊介绍： Advances in Astronomy publishes articles in all areas of astronomy, astrophysics, and cosmology. The journal accepts both observational and theoretical investigations into celestial objects and the wider universe, as well as the reports of new methods and instrumentation for their study.