Pub Date : 2023-12-28DOI: 10.1007/s10844-023-00838-5
Vincenzo Pasquadibisceglie, Annalisa Appice, Giuseppe Ieva, Donato Malerba
Retail companies are greatly interested in performing continuous monitoring of purchase traces of customers, to identify weak customers and take the necessary actions to improve customer satisfaction and ensure their revenues remain unaffected. In this paper, we formulate the customer churn prediction problem as a Predictive Process Monitoring (PPM) problem to be addressed under possible dynamic conditions of evolving retail data environments. To this aim, we propose TSUNAMI as a PPM approach to monitor the customer loyalty in the retail sector. It processes online the sale receipt stream produced by customers of a retail business company and learns a deep neural model to early detect possible purchase customer traces that will outcome in future churners. In addition, the proposed approach integrates a mechanism to detect concept drifts in customer purchase traces and adapts the deep neural model to concept drifts. Finally, to make decisions of customer purchase monitoring explainable to potential stakeholders, we analyse Shapley values of decisions, to explain which characteristics of the customer purchase traces are the most relevant for disentangling churners from non-churners and how these characteristics have possibly changed over time. Experiments with two benchmark retail data sets explore the effectiveness of the proposed approach.
{"title":"TSUNAMI - an explainable PPM approach for customer churn prediction in evolving retail data environments","authors":"Vincenzo Pasquadibisceglie, Annalisa Appice, Giuseppe Ieva, Donato Malerba","doi":"10.1007/s10844-023-00838-5","DOIUrl":"https://doi.org/10.1007/s10844-023-00838-5","url":null,"abstract":"<p>Retail companies are greatly interested in performing continuous monitoring of purchase traces of customers, to identify weak customers and take the necessary actions to improve customer satisfaction and ensure their revenues remain unaffected. In this paper, we formulate the customer churn prediction problem as a Predictive Process Monitoring (PPM) problem to be addressed under possible dynamic conditions of evolving retail data environments. To this aim, we propose <span>TSUNAMI</span> as a PPM approach to monitor the customer loyalty in the retail sector. It processes online the sale receipt stream produced by customers of a retail business company and learns a deep neural model to early detect possible purchase customer traces that will outcome in future churners. In addition, the proposed approach integrates a mechanism to detect concept drifts in customer purchase traces and adapts the deep neural model to concept drifts. Finally, to make decisions of customer purchase monitoring explainable to potential stakeholders, we analyse Shapley values of decisions, to explain which characteristics of the customer purchase traces are the most relevant for disentangling churners from non-churners and how these characteristics have possibly changed over time. Experiments with two benchmark retail data sets explore the effectiveness of the proposed approach.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2023-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139065395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose and experimentally assess an innovative framework for scaling posterior distributions over different-curation datasets, based on Bayesian-Neural-Networks (BNN). Another innovation of our proposed study consists in enhancing the accuracy of the Bayesian classifier via intelligent sampling algorithms. The proposed methodology is relevant in emerging applicative settings, such as provenance detection and analysis and cybercrime. Our contributions are complemented by a comprehensive experimental evaluation and analysis over both static and dynamic image datasets. Derived results confirm the successful application of our proposed methodology to emerging big data analytics settings.
{"title":"A bayesian-neural-networks framework for scaling posterior distributions over different-curation datasets","authors":"Alfredo Cuzzocrea, Alessandro Baldo, Edoardo Fadda","doi":"10.1007/s10844-023-00837-6","DOIUrl":"https://doi.org/10.1007/s10844-023-00837-6","url":null,"abstract":"<p>In this paper, we propose and experimentally assess <i>an innovative framework for scaling posterior distributions over different-curation datasets, based on Bayesian-Neural-Networks (BNN)</i>. Another innovation of our proposed study consists in enhancing the accuracy of the Bayesian classifier via intelligent sampling algorithms. The proposed methodology is relevant in emerging applicative settings, such as <i>provenance detection and analysis</i> and <i>cybercrime</i>. Our contributions are complemented by a comprehensive experimental evaluation and analysis over both static and dynamic image datasets. Derived results confirm the successful application of our proposed methodology to emerging <i>big data analytics</i> settings.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2023-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139051948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-12DOI: 10.1007/s10844-023-00835-8
Cataldo Musto, Alessandro Francesco Maria Martina, Andrea Iovine, Fedelucio Narducci, Marco de Gemmis, Giovanni Semeraro
Preference elicitation is a crucial step for every recommendation algorithm. In this paper, we present a strategy that allows users to express their preferences and needs through natural language statements. In particular, our natural language preference elicitation pipeline allows users to express preferences on objective movie features (e.g., actors, directors, etc.) as well as on subjective features that are collected by mining user-written movie reviews. To validate our claims, we carried out a user study in the movie domain ((N=114)). The main finding of our experiment is that users tend to express their preferences by using objective features, whose usage largely overcomes that of subjective features, which are more complicated to be expressed. However, when the users are able to express their preferences also in terms of subjective features, they obtain better recommendations in a lower number of conversation turns. We have also identified the main challenges that arise when users talk to the virtual assistant by using subjective features, and this paves the way for future developments of our methodology.
{"title":"Tell me what you Like: introducing natural language preference elicitation strategies in a virtual assistant for the movie domain","authors":"Cataldo Musto, Alessandro Francesco Maria Martina, Andrea Iovine, Fedelucio Narducci, Marco de Gemmis, Giovanni Semeraro","doi":"10.1007/s10844-023-00835-8","DOIUrl":"https://doi.org/10.1007/s10844-023-00835-8","url":null,"abstract":"<p>Preference elicitation is a crucial step for every recommendation algorithm. In this paper, we present a strategy that allows users to express their preferences and needs through natural language statements. In particular, our natural language preference elicitation pipeline allows users to express preferences on <i>objective</i> movie features (e.g., actors, directors, etc.) as well as on <i>subjective</i> features that are collected by mining user-written movie reviews. To validate our claims, we carried out a user study in the movie domain (<span>(N=114)</span>). The main finding of our experiment is that users tend to express their preferences by using <i>objective</i> features, whose usage largely overcomes that of <i>subjective</i> features, which are more complicated to be expressed. However, when the users are able to express their preferences also in terms of <i>subjective</i> features, they obtain better recommendations in a lower number of conversation turns. We have also identified the main challenges that arise when users talk to the virtual assistant by using subjective features, and this paves the way for future developments of our methodology.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138629518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-12DOI: 10.1007/s10844-023-00833-w
Simona Nisticò, Luigi Palopoli, Adele Pia Romano
Audio super-resolution refers to techniques that improve the audio signals quality, usually by exploiting bandwidth extension methods, whereby audio enhancement is obtained by expanding the phase and the spectrogram of the input audio traces. These techniques are therefore much significant for all those cases where audio traces miss relevant parts of the audible spectrum. In several cases, the given input signal contains the low-band frequencies (the easiest to capture with low-quality recording instruments) whereas the high-band must be generated. In this paper, we illustrate techniques implemented into a system for bandwidth extension that works on musical tracks and generates the high-band frequencies starting from the low-band ones. The system, called ViT Super-resolution ((textit{ViT-SR})), features an architecture based on a Generative Adversarial Network and Vision Transformer model. In particular, two versions of the architecture will be presented in this paper, that work on different input frequency ranges. Experiments, which are accounted for in the paper, prove the effectiveness of our approach. In particular, the objective has been attained to demonstrate that it is possible to faithfully reconstruct the high-band signal of an audio file having only its low-band spectrum available as the input, therewith including the usually difficult to synthetically generate harmonics occurring in the audio tracks, which significantly contribute to the final perceived sound quality.
{"title":"Audio super-resolution via vision transformer","authors":"Simona Nisticò, Luigi Palopoli, Adele Pia Romano","doi":"10.1007/s10844-023-00833-w","DOIUrl":"https://doi.org/10.1007/s10844-023-00833-w","url":null,"abstract":"<p>Audio super-resolution refers to techniques that improve the audio signals quality, usually by exploiting bandwidth extension methods, whereby audio enhancement is obtained by expanding the phase and the spectrogram of the input audio traces. These techniques are therefore much significant for all those cases where audio traces miss relevant parts of the audible spectrum. In several cases, the given input signal contains the low-band frequencies (the easiest to capture with low-quality recording instruments) whereas the high-band must be generated. In this paper, we illustrate techniques implemented into a system for bandwidth extension that works on musical tracks and generates the high-band frequencies starting from the low-band ones. The system, called <i>ViT Super-resolution</i> (<span>(textit{ViT-SR})</span>), features an architecture based on a Generative Adversarial Network and Vision Transformer model. In particular, two versions of the architecture will be presented in this paper, that work on different input frequency ranges. Experiments, which are accounted for in the paper, prove the effectiveness of our approach. In particular, the objective has been attained to demonstrate that it is possible to faithfully reconstruct the high-band signal of an audio file having only its low-band spectrum available as the input, therewith including the usually difficult to synthetically generate harmonics occurring in the audio tracks, which significantly contribute to the final perceived sound quality.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138630225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Food Security (FS) is a major concern in West Africa, particularly in Burkina Faso, which has been the epicenter of a humanitarian crisis since the beginning of this century. Early warning systems for FS and famines rely mainly on numerical data for their analyses, whereas textual data, which are more complex to process, are rarely used. However, this data is easy to access and represents a source of relevant information that is complementary to commonly used data sources. This study explores methods for obtaining the explanatory context associated with FS from textual data. Based on a corpus of local newspaper articles, we analyze FS over the last ten years in Burkina Faso. We propose an original and dedicated pipeline that combines different textual analysis approaches to obtain an explanatory model evaluated on real-world and large-scale data. The results of our analyses have proven how our approach provides significant results that offer distinct and complementary qualitative information on food security and its spatial and temporal characteristics.
{"title":"How can text mining improve the explainability of Food security situations?","authors":"Hugo Deléglise, Agnès Bégué, Roberto Interdonato, Elodie Maître d’Hôtel, Mathieu Roche, Maguelonne Teisseire","doi":"10.1007/s10844-023-00832-x","DOIUrl":"https://doi.org/10.1007/s10844-023-00832-x","url":null,"abstract":"<p>Food Security (FS) is a major concern in West Africa, particularly in Burkina Faso, which has been the epicenter of a humanitarian crisis since the beginning of this century. Early warning systems for FS and famines rely mainly on numerical data for their analyses, whereas textual data, which are more complex to process, are rarely used. However, this data is easy to access and represents a source of relevant information that is complementary to commonly used data sources. This study explores methods for obtaining the explanatory context associated with FS from textual data. Based on a corpus of local newspaper articles, we analyze FS over the last ten years in Burkina Faso. We propose an original and dedicated pipeline that combines different textual analysis approaches to obtain an explanatory model evaluated on real-world and large-scale data. The results of our analyses have proven how our approach provides significant results that offer distinct and complementary qualitative information on food security and its spatial and temporal characteristics.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138577089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Argument pair extraction (APE) is a fine-grained task of argument mining which aims to identify arguments offered by different participants in some discourse and detect interaction relationships between arguments from different participants. In recent years, many research efforts have been devoted to dealing with APE in a multi-task learning framework. Although these approaches have achieved encouraging results, they still face several challenging issues. First, different types of sentence relationships as well as different levels of information exchange among sentences are largely ignored. Second, they solely model interactions between argument pairs either in an explicit or implicit strategy, while neglecting the complementary effect of the two strategies. In this paper, we propose a novel Mutually Enhanced Multi-Scale Relation-Aware Graph Convolutional Network (MMR-GCN) for APE. Specifically, we first design a multi-scale relation-aware graph aggregation module to explicitly model the complex relationships between review and rebuttal passage sentences. In addition, we propose a mutually enhancement transformer module to implicitly and interactively enhance representations of review and rebuttal passage sentences. We experimentally validate MMR-GCN by comparing with the state-of-the-art APE methods. Experimental results show that it considerably outperforms all baseline methods, and the relative performance improvement of MMR-GCN over the best performing baseline MRC-APE in terms of F1 score reaches to 3.48% and 4.43% on the two benchmark datasets, respectively.
{"title":"A mutually enhanced multi-scale relation-aware graph convolutional network for argument pair extraction","authors":"Xiaofei Zhu, Yidan Liu, Zhuo Chen, Xu Chen, Jiafeng Guo, Stefan Dietze","doi":"10.1007/s10844-023-00826-9","DOIUrl":"https://doi.org/10.1007/s10844-023-00826-9","url":null,"abstract":"<p>Argument pair extraction (APE) is a fine-grained task of argument mining which aims to identify arguments offered by different participants in some discourse and detect interaction relationships between arguments from different participants. In recent years, many research efforts have been devoted to dealing with APE in a multi-task learning framework. Although these approaches have achieved encouraging results, they still face several challenging issues. First, different types of sentence relationships as well as different levels of information exchange among sentences are largely ignored. Second, they solely model interactions between argument pairs either in an explicit or implicit strategy, while neglecting the complementary effect of the two strategies. In this paper, we propose a novel Mutually Enhanced Multi-Scale Relation-Aware Graph Convolutional Network (MMR-GCN) for APE. Specifically, we first design a multi-scale relation-aware graph aggregation module to explicitly model the complex relationships between review and rebuttal passage sentences. In addition, we propose a mutually enhancement transformer module to implicitly and interactively enhance representations of review and rebuttal passage sentences. We experimentally validate MMR-GCN by comparing with the state-of-the-art APE methods. Experimental results show that it considerably outperforms all baseline methods, and the relative performance improvement of MMR-GCN over the best performing baseline MRC-APE in terms of F1 score reaches to 3.48% and 4.43% on the two benchmark datasets, respectively.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138518264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hiring knowledgeable and cost-effective individuals, who use their knowledge and expertise to boost the organization, is extremely important for organizations as they are the most valuable assets. T-shaped experts are the best option based on agile methodology. The T-shaped professional has a deep understanding of one topic and broad knowledge of several others. Compared to other types of professionals, T-shaped professionals are better communicators and cheaper to hire. Finding T-shaped experts in a given skill area requires determining each candidate’s depth of knowledge and shape of expertise. To estimate each candidate’s depth of knowledge in a given skill area, we propose a translation-based method that utilizes two attention-based skill translation models to overcome the vocabulary mismatch between skills and user documents. We also propose two new approaches based on binary cross-entropy and focal loss to determine whether each user is T-shaped. Our experiments on three collections of the StackOverflow dataset demonstrate the efficiency of our proposed method compared to the state-of-the-art approaches.
{"title":"T-shaped expert mining: a novel approach based on skill translation and focal loss","authors":"Zohreh Fallahnejad, Mahmood Karimian, Fatemeh Lashkari, Hamid Beigy","doi":"10.1007/s10844-023-00831-y","DOIUrl":"https://doi.org/10.1007/s10844-023-00831-y","url":null,"abstract":"<p>Hiring knowledgeable and cost-effective individuals, who use their knowledge and expertise to boost the organization, is extremely important for organizations as they are the most valuable assets. T-shaped experts are the best option based on agile methodology. The T-shaped professional has a deep understanding of one topic and broad knowledge of several others. Compared to other types of professionals, T-shaped professionals are better communicators and cheaper to hire. Finding T-shaped experts in a given skill area requires determining each candidate’s depth of knowledge and shape of expertise. To estimate each candidate’s depth of knowledge in a given skill area, we propose a translation-based method that utilizes two attention-based skill translation models to overcome the vocabulary mismatch between skills and user documents. We also propose two new approaches based on binary cross-entropy and focal loss to determine whether each user is T-shaped. Our experiments on three collections of the StackOverflow dataset demonstrate the efficiency of our proposed method compared to the state-of-the-art approaches.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138518263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-24DOI: 10.1007/s10844-023-00829-6
Fabrizio Angiulli, Fabio Fassetti, Luca Ferragina
({{textbf{Latent}}varvec{Out}}) is a recently introduced algorithm for unsupervised anomaly detection which enhances latent space-based neural methods, namely (Variational) Autoencoders, GANomaly and ANOGan architectures. The main idea behind it is to exploit both the latent space and the baseline score of these architectures in order to provide a refined anomaly score performing density estimation in the augmented latent-space/baseline-score feature space. In this paper we investigate the performance of ({{textbf{Latent}}varvec{Out}}) acting as a one-class classifier and we experiment the combination of ({{textbf{Latent}}varvec{Out}}) with GAAL architectures, a novel type of Generative Adversarial Networks for unsupervised anomaly detection. Moreover, we show that the feature space induced by ({{textbf{Latent}}varvec{Out}}) has the characteristic to enhance the separation between normal and anomalous data. Indeed, we prove that standard data mining outlier detection methods perform better when applied on this novel augmented latent space rather than on the original data space.
{"title":"Enhancing anomaly detectors with LatentOut","authors":"Fabrizio Angiulli, Fabio Fassetti, Luca Ferragina","doi":"10.1007/s10844-023-00829-6","DOIUrl":"https://doi.org/10.1007/s10844-023-00829-6","url":null,"abstract":"<p><span>({{textbf{Latent}}varvec{Out}})</span> is a recently introduced algorithm for unsupervised anomaly detection which enhances latent space-based neural methods, namely (<i>Variational</i>) <i>Autoencoders</i>, <i>GANomaly</i> and <i>ANOGan</i> architectures. The main idea behind it is to exploit both the latent space and the baseline score of these architectures in order to provide a refined anomaly score performing density estimation in the augmented latent-space/baseline-score feature space. In this paper we investigate the performance of <span>({{textbf{Latent}}varvec{Out}})</span> acting as a one-class classifier and we experiment the combination of <span>({{textbf{Latent}}varvec{Out}})</span> with <i>GAAL</i> architectures, a novel type of Generative Adversarial Networks for unsupervised anomaly detection. Moreover, we show that the feature space induced by <span>({{textbf{Latent}}varvec{Out}})</span> has the characteristic to enhance the separation between normal and anomalous data. Indeed, we prove that standard data mining outlier detection methods perform better when applied on this novel augmented latent space rather than on the original data space.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2023-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138518258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-18DOI: 10.1007/s10844-023-00828-7
Yasser Abduallah, Jason T. L. Wang, Haimin Wang, Ju Jing
Geomagnetic activities have a crucial impact on Earth, which can affect spacecraft and electrical power grids. Geospace scientists use a geomagnetic index, called the Kp index, to describe the overall level of geomagnetic activity. This index is an important indicator of disturbances in the Earth’s magnetic field and is used by the U.S. Space Weather Prediction Center as an alert and warning service for users who may be affected by the disturbances. Another commonly used index, called the ap index, is converted from the Kp index. Early and accurate prediction of the Kp and ap indices is essential for preparedness and disaster risk management. In this paper, we present a deep learning framework, named GNet, to perform short-term forecasting of the Kp and ap indices. Specifically, GNet takes as input time series of solar wind parameters’ values, provided by NASA’s Space Science Data Coordinated Archive, and predicts as output the Kp and ap indices respectively at time point (varvec{t + w}) hours for a given time point (varvec{t}) where (varvec{w}) ranges from 1 to 9. GNet combines transformer encoder blocks with Bayesian inference, which is capable of quantifying both aleatoric uncertainty (data uncertainty) and epistemic uncertainty (model uncertainty) in making predictions. Experimental results show that GNet outperforms closely related machine learning methods in terms of the root mean square error and R-squared score. Furthermore, GNet can provide both data and model uncertainty quantification results, which the existing methods cannot offer. To our knowledge, this is the first time that Bayesian transformers have been used for geomagnetic activity prediction.
{"title":"A transformer-based framework for predicting geomagnetic indices with uncertainty quantification","authors":"Yasser Abduallah, Jason T. L. Wang, Haimin Wang, Ju Jing","doi":"10.1007/s10844-023-00828-7","DOIUrl":"https://doi.org/10.1007/s10844-023-00828-7","url":null,"abstract":"<p>Geomagnetic activities have a crucial impact on Earth, which can affect spacecraft and electrical power grids. Geospace scientists use a geomagnetic index, called the Kp index, to describe the overall level of geomagnetic activity. This index is an important indicator of disturbances in the Earth’s magnetic field and is used by the U.S. Space Weather Prediction Center as an alert and warning service for users who may be affected by the disturbances. Another commonly used index, called the ap index, is converted from the Kp index. Early and accurate prediction of the Kp and ap indices is essential for preparedness and disaster risk management. In this paper, we present a deep learning framework, named GNet, to perform short-term forecasting of the Kp and ap indices. Specifically, GNet takes as input time series of solar wind parameters’ values, provided by NASA’s Space Science Data Coordinated Archive, and predicts as output the Kp and ap indices respectively at time point <span>(varvec{t + w})</span> hours for a given time point <span>(varvec{t})</span> where <span>(varvec{w})</span> ranges from 1 to 9. GNet combines transformer encoder blocks with Bayesian inference, which is capable of quantifying both aleatoric uncertainty (data uncertainty) and epistemic uncertainty (model uncertainty) in making predictions. Experimental results show that GNet outperforms closely related machine learning methods in terms of the root mean square error and R-squared score. Furthermore, GNet can provide both data and model uncertainty quantification results, which the existing methods cannot offer. To our knowledge, this is the first time that Bayesian transformers have been used for geomagnetic activity prediction.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2023-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138516069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}