Azim Ahmadzadeh, Berkay Aydin, Dustin J. Kempton, Maxwell Hostetter, R. Angryk, M. Georgoulis, Sushant S. Mahajan
{"title":"Rare-Event Time Series Prediction: A Case Study of Solar Flare Forecasting","authors":"Azim Ahmadzadeh, Berkay Aydin, Dustin J. Kempton, Maxwell Hostetter, R. Angryk, M. Georgoulis, Sushant S. Mahajan","doi":"10.1109/ICMLA.2019.00293","DOIUrl":null,"url":null,"abstract":"We present a case study for time series prediction models in extreme class-imbalance problems. We have extracted multiple properties from the Space Weather ANalytics for Solar Flares (SWAN-SF) benchmark dataset which comprises of magnetic features from over 4075 active regions over a period of 9 years to create the forecasting dataset used in this study. In the extracted dataset, the class-imbalance ratio is 1:60, where the minority class is formed by instances of strong solar flares (GOES M-and X-class). This ratio reaches to 1:800 if we only consider the strongest class of flares (GOES X-class). This case of extreme imbalance, along with the temporal coherence of the sliced time series, provides us with an interesting set of challenges in the forecasting of scarce real-life phenomena. We have explored remedies to tackle the class-imbalance issue such as undersampling, oversampling and misclassification weights. In the process, we elaborate on common mistakes and pitfalls caused by ignoring the side effects of these remedies, including how and why they weaken the robustness of the trained models while seemingly improving the performance.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2019.00293","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
We present a case study for time series prediction models in extreme class-imbalance problems. We have extracted multiple properties from the Space Weather ANalytics for Solar Flares (SWAN-SF) benchmark dataset which comprises of magnetic features from over 4075 active regions over a period of 9 years to create the forecasting dataset used in this study. In the extracted dataset, the class-imbalance ratio is 1:60, where the minority class is formed by instances of strong solar flares (GOES M-and X-class). This ratio reaches to 1:800 if we only consider the strongest class of flares (GOES X-class). This case of extreme imbalance, along with the temporal coherence of the sliced time series, provides us with an interesting set of challenges in the forecasting of scarce real-life phenomena. We have explored remedies to tackle the class-imbalance issue such as undersampling, oversampling and misclassification weights. In the process, we elaborate on common mistakes and pitfalls caused by ignoring the side effects of these remedies, including how and why they weaken the robustness of the trained models while seemingly improving the performance.