Sentiment analysis as a part of natural language processing (NLP) has received much attention following the demand to understand people’s opinions. Aspect-based sentiment analysis (ABSA) is a fine-grained task from sentiment analysis that aims to classify the sentiment at the aspect level. Throughout the years, researchers have formulated ABSA into various tasks for different scenarios. Unlike early works, current ABSA tasks utilize many elements to provide more details to produce informative results. However, it is difficult to completely explore the works of ABSA because of the many different tasks, terms, and results. This paper surveyed recent studies on ABSA, specifically on its complex compound tasks. We investigated some key elements, problem formulations, and datasets currently utilized by most ABSA communities. We focused on reviewing the latest methodologies and worked to find the current state-of-the-art methodologies by performing a comparative analysis. From our study, we found that there has been a shift to generative methods in solving the ABSA problem, which signifies the evolving emphasis on holistic, end-to-end approaches. Finally, we identified some open challenges and future directions for ABSA research.
{"title":"Methodologies and their comparison in complex compound aspect-based sentiment analysis: A survey","authors":"Faiz Ghifari Haznitrama, Ho-Jin Choi, Chin-Wan Chung","doi":"10.1016/j.aiopen.2025.02.002","DOIUrl":"10.1016/j.aiopen.2025.02.002","url":null,"abstract":"<div><div>Sentiment analysis as a part of natural language processing (NLP) has received much attention following the demand to understand people’s opinions. Aspect-based sentiment analysis (ABSA) is a fine-grained task from sentiment analysis that aims to classify the sentiment at the aspect level. Throughout the years, researchers have formulated ABSA into various tasks for different scenarios. Unlike early works, current ABSA tasks utilize many elements to provide more details to produce informative results. However, it is difficult to completely explore the works of ABSA because of the many different tasks, terms, and results. This paper surveyed recent studies on ABSA, specifically on its complex compound tasks. We investigated some key elements, problem formulations, and datasets currently utilized by most ABSA communities. We focused on reviewing the latest methodologies and worked to find the current <em>state-of-the-art</em> methodologies by performing a comparative analysis. From our study, we found that there has been a shift to generative methods in solving the ABSA problem, which signifies the evolving emphasis on holistic, end-to-end approaches. Finally, we identified some open challenges and future directions for ABSA research.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 53-69"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143601111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-07-01DOI: 10.1016/j.aiopen.2025.04.001
Zhi Chen , Da Ma , Hanqi Li , Lu Chen , Jiabao Ji , Yuncong Liu , Bei Chen , Mengyue Wu , Su Zhu , Xin Dong , Fujiang Ge , Qingliang Miao , Jian-Guang Lou , Shuai Fan , Kai Yu
Building a universal conversational agent has been a long-standing goal of the dialogue research community. Most previous works only focus on a small set of dialogue tasks. In this work, we aim to build a unified dialogue foundation model (DFM) which can be used to solve massive diverse dialogue tasks. To achieve this goal, a large-scale well-annotated dialogue dataset with rich task diversity (DialogZoo) is collected. We introduce a framework to unify all dialogue tasks and propose novel auxiliary self-supervised tasks to achieve stable training of DFM on the highly diverse large scale DialogZoo corpus. Experiments show that, compared with models of the same size, DFM can achieve competitive performance on very rich cross-domain downstream dialogue tasks. Furthermore, when scaling to large language models, DFM remains effective. This demonstrates that DFM largely extends the ability of unified dialogue pre-trained model.
{"title":"DFM: Dialogue foundation model for universal large-scale dialogue-oriented task learning","authors":"Zhi Chen , Da Ma , Hanqi Li , Lu Chen , Jiabao Ji , Yuncong Liu , Bei Chen , Mengyue Wu , Su Zhu , Xin Dong , Fujiang Ge , Qingliang Miao , Jian-Guang Lou , Shuai Fan , Kai Yu","doi":"10.1016/j.aiopen.2025.04.001","DOIUrl":"10.1016/j.aiopen.2025.04.001","url":null,"abstract":"<div><div>Building a universal conversational agent has been a long-standing goal of the dialogue research community. Most previous works only focus on a small set of dialogue tasks. In this work, we aim to build a unified dialogue foundation model (DFM) which can be used to solve massive diverse dialogue tasks. To achieve this goal, a large-scale well-annotated dialogue dataset with rich task diversity (DialogZoo) is collected. We introduce a framework to unify all dialogue tasks and propose novel auxiliary self-supervised tasks to achieve stable training of DFM on the highly diverse large scale DialogZoo corpus. Experiments show that, compared with models of the same size, DFM can achieve competitive performance on very rich cross-domain downstream dialogue tasks. Furthermore, when scaling to large language models, DFM remains effective. This demonstrates that DFM largely extends the ability of unified dialogue pre-trained model.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 108-117"},"PeriodicalIF":14.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144828236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-09-23DOI: 10.1016/j.aiopen.2025.09.001
Francesco Di Feola , Lorenzo Tronchin , Valerio Guarrasi , Paolo Soda
Generative Adversarial Networks (GANs) have proved as a powerful framework for denoising applications in medical imaging. However, GAN-based denoising algorithms still suffer from limitations in capturing complex relationships within the images. In this regard, the loss function plays a crucial role in guiding the image generation process, encompassing how much a synthetic image differs from a real image. To grasp highly complex and non-linear textural relationships in the training process, this work presents a novel approach to capture and embed multi-scale texture information into the loss function. Our method introduces a differentiable multi-scale texture representation of the images dynamically aggregated by a self-attention layer, thus exploiting end-to-end gradient-based optimization. We validate our approach by carrying out extensive experiments in the context of low-dose CT denoising, a challenging application that aims to enhance the quality of noisy CT scans. We utilize three publicly available datasets, including one simulated and two real datasets. The results are promising as compared to other well-established loss functions, being also consistent across three different GAN architectures. The code is available at: https://github.com/trainlab/MSTLF-TextureLoss.
{"title":"Multi-scale texture loss for CT denoising with GANs","authors":"Francesco Di Feola , Lorenzo Tronchin , Valerio Guarrasi , Paolo Soda","doi":"10.1016/j.aiopen.2025.09.001","DOIUrl":"10.1016/j.aiopen.2025.09.001","url":null,"abstract":"<div><div>Generative Adversarial Networks (GANs) have proved as a powerful framework for denoising applications in medical imaging. However, GAN-based denoising algorithms still suffer from limitations in capturing complex relationships within the images. In this regard, the loss function plays a crucial role in guiding the image generation process, encompassing how much a synthetic image differs from a real image. To grasp highly complex and non-linear textural relationships in the training process, this work presents a novel approach to capture and embed multi-scale texture information into the loss function. Our method introduces a differentiable multi-scale texture representation of the images dynamically aggregated by a self-attention layer, thus exploiting end-to-end gradient-based optimization. We validate our approach by carrying out extensive experiments in the context of low-dose CT denoising, a challenging application that aims to enhance the quality of noisy CT scans. We utilize three publicly available datasets, including one simulated and two real datasets. The results are promising as compared to other well-established loss functions, being also consistent across three different GAN architectures. The code is available at: <span><span>https://github.com/trainlab/MSTLF-TextureLoss</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 142-154"},"PeriodicalIF":14.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145157276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-02-12DOI: 10.1016/j.aiopen.2025.01.001
Rui Hao , Linmei Hu , Weijian Qi , Qingliu Wu , Yirui Zhang , Liqiang Nie
Dialogue-based language models mark a huge milestone in the field of artificial intelligence, by their impressive ability to interact with users, as well as a series of challenging tasks prompted by customized instructions. However, the prevalent large-scale dialogue-based language models like ChatGPT still have room for improvement, such as unstable responses to questions and the inability to think cooperatively like humans. Considering the ability of dialogue-based language models in conversation and their inherent randomness in thinking, we propose ChatLLM network that allows multiple dialogue-based language models to interact, provide feedback, and think together. We design a network of ChatLLMs, consisting multiple layers of language models. Specifically, individual instances of language model may possess distinct perspectives towards the same problem, and by consolidating these diverse viewpoints via a separate language model, the ChatLLM network system can conduct decision-making more objectively and comprehensively. In addition, a language-based feedback mechanism comparable to backpropagation is devised to update the outputs of the language models within the network. This stratified system of interaction can be analogized to the relationship between leaders and employees in a social organization, where collective decision-making often yields superior judgments or resolutions. Experiments on datasets demonstrate that our network attains significant improvements in problem-solving, leading to observable progress amongst each member.
{"title":"ChatLLM network: More brains, more intelligence","authors":"Rui Hao , Linmei Hu , Weijian Qi , Qingliu Wu , Yirui Zhang , Liqiang Nie","doi":"10.1016/j.aiopen.2025.01.001","DOIUrl":"10.1016/j.aiopen.2025.01.001","url":null,"abstract":"<div><div>Dialogue-based language models mark a huge milestone in the field of artificial intelligence, by their impressive ability to interact with users, as well as a series of challenging tasks prompted by customized instructions. However, the prevalent large-scale dialogue-based language models like ChatGPT still have room for improvement, such as unstable responses to questions and the inability to think cooperatively like humans. Considering the ability of dialogue-based language models in conversation and their inherent randomness in thinking, we propose ChatLLM network that allows multiple dialogue-based language models to interact, provide feedback, and think together. We design a network of ChatLLMs, consisting multiple layers of language models. Specifically, individual instances of language model may possess distinct perspectives towards the same problem, and by consolidating these diverse viewpoints via a separate language model, the ChatLLM network system can conduct decision-making more objectively and comprehensively. In addition, a language-based feedback mechanism comparable to backpropagation is devised to update the outputs of the language models within the network. This stratified system of interaction can be analogized to the relationship between leaders and employees in a social organization, where collective decision-making often yields superior judgments or resolutions. Experiments on datasets demonstrate that our network attains significant improvements in problem-solving, leading to observable progress amongst each member.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 45-52"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143420049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2024-01-09DOI: 10.1016/j.aiopen.2024.01.002
{"title":"Erratum regarding Declaration of Competing Interest statements in previously published articles","authors":"","doi":"10.1016/j.aiopen.2024.01.002","DOIUrl":"10.1016/j.aiopen.2024.01.002","url":null,"abstract":"","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 331-332"},"PeriodicalIF":14.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-03-03DOI: 10.1016/j.aiopen.2025.02.001
Michail Mamalakis , Antonios Mamalakis , Ingrid Agartz , Lynn Egeland Mørch-Johnsen , Graham K. Murray , John Suckling , Pietro Lio
The accelerated progress of artificial intelligence (AI) has popularized deep learning models across various domains, yet their inherent opacity poses challenges, particularly in critical fields like healthcare, medicine, and the geosciences. Explainable AI (XAI) has emerged to shed light on these ’black box’ models, aiding in deciphering their decision-making processes. However, different XAI methods often produce significantly different explanations, leading to high inter-method variability that increases uncertainty and undermines trust in deep networks’ predictions. In this study, we address this challenge by introducing a novel framework designed to enhance the explainability of deep networks through a dual focus on maximizing both accuracy and comprehensibility in the explanations. Our framework integrates outputs from multiple established XAI methods and leverages a non-linear neural network model, termed the ‘Explanation optimizer,’ to construct a unified, optimal explanation. The optimizer uses two primary metrics — faithfulness and complexity — to evaluate the quality of the explanations. Faithfulness measures the accuracy with which the explanation reflects the network’s decision-making, while complexity assesses the comprehensibility of the explanation. By balancing these metrics, the optimizer provides explanations that are both accurate and accessible, addressing a central limitation in current XAI methods. Through experiments on multi-class and binary classification tasks in both 2D object and 3D neuroscience imaging, we validate the efficacy of our approach. Our explanation optimizer achieved superior faithfulness scores, averaging 155% and 63% higher than the best-performing individual XAI methods in the 3D and 2D applications, respectively, while also reducing complexity to enhance comprehensibility. These results demonstrate that optimal explanations based on specific quality criteria are achievable, offering a solution to the issue of inter-method variability in the current XAI literature and supporting more trustworthy deep network predictions.
{"title":"Solving the enigma: Enhancing faithfulness and comprehensibility in explanations of deep networks","authors":"Michail Mamalakis , Antonios Mamalakis , Ingrid Agartz , Lynn Egeland Mørch-Johnsen , Graham K. Murray , John Suckling , Pietro Lio","doi":"10.1016/j.aiopen.2025.02.001","DOIUrl":"10.1016/j.aiopen.2025.02.001","url":null,"abstract":"<div><div>The accelerated progress of artificial intelligence (AI) has popularized deep learning models across various domains, yet their inherent opacity poses challenges, particularly in critical fields like healthcare, medicine, and the geosciences. Explainable AI (XAI) has emerged to shed light on these ’black box’ models, aiding in deciphering their decision-making processes. However, different XAI methods often produce significantly different explanations, leading to high inter-method variability that increases uncertainty and undermines trust in deep networks’ predictions. In this study, we address this challenge by introducing a novel framework designed to enhance the explainability of deep networks through a dual focus on maximizing both accuracy and comprehensibility in the explanations. Our framework integrates outputs from multiple established XAI methods and leverages a non-linear neural network model, termed the ‘Explanation optimizer,’ to construct a unified, optimal explanation. The optimizer uses two primary metrics — faithfulness and complexity — to evaluate the quality of the explanations. Faithfulness measures the accuracy with which the explanation reflects the network’s decision-making, while complexity assesses the comprehensibility of the explanation. By balancing these metrics, the optimizer provides explanations that are both accurate and accessible, addressing a central limitation in current XAI methods. Through experiments on multi-class and binary classification tasks in both 2D object and 3D neuroscience imaging, we validate the efficacy of our approach. Our explanation optimizer achieved superior faithfulness scores, averaging 155% and 63% higher than the best-performing individual XAI methods in the 3D and 2D applications, respectively, while also reducing complexity to enhance comprehensibility. These results demonstrate that optimal explanations based on specific quality criteria are achievable, offering a solution to the issue of inter-method variability in the current XAI literature and supporting more trustworthy deep network predictions.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 70-81"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143628964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-09-22DOI: 10.1016/j.aiopen.2025.08.004
Zahiriddin Rustamov , Ayham Zaitouny , Nazar Zaki
Instance selection (IS) addresses the critical challenge of reducing dataset size while keeping informative characteristics, becoming increasingly important as datasets grow to millions of instances. Current IS methods often struggle with capturing complex relationships in high-dimensional spaces and scale with large datasets. This paper introduces a graph attention-based instance selection (GAIS) method that uses attention mechanisms to identify informative instances through their structural relationships in graph representations. We present two approaches for scalable graph construction: a distance-based mini-batch sampling technique that achieves dataset-size-independent complexity through strategic batch processing, and a hierarchical hashing approach that enables efficient similarity computation through random projections. The mini-batch approach keeps class distributions through stratified sampling, while the hierarchical hashing method captures relationships at multiple granularities through single-level, multi-level, and multi-view variants. Experiments across 39 datasets show that GAIS achieves reduction rates above 96% while maintaining or improving model performance relative to state-of-the-art IS methods. The findings show that the distance-based mini-batch approach offers an optimal efficiency for large-scale datasets, while multi-view variants excel on complex, high-dimensional data, demonstrating that attention-based importance scoring can effectively identify instances important for maintaining decision boundaries while avoiding computationally prohibitive pairwise comparisons. The code is publicly available at https://github.com/zahiriddin-rustamov/gais.
{"title":"Scalable graph attention-based instance selection via mini-batch sampling and hierarchical hashing","authors":"Zahiriddin Rustamov , Ayham Zaitouny , Nazar Zaki","doi":"10.1016/j.aiopen.2025.08.004","DOIUrl":"10.1016/j.aiopen.2025.08.004","url":null,"abstract":"<div><div>Instance selection (IS) addresses the critical challenge of reducing dataset size while keeping informative characteristics, becoming increasingly important as datasets grow to millions of instances. Current IS methods often struggle with capturing complex relationships in high-dimensional spaces and scale with large datasets. This paper introduces a graph attention-based instance selection (GAIS) method that uses attention mechanisms to identify informative instances through their structural relationships in graph representations. We present two approaches for scalable graph construction: a distance-based mini-batch sampling technique that achieves dataset-size-independent complexity through strategic batch processing, and a hierarchical hashing approach that enables efficient similarity computation through random projections. The mini-batch approach keeps class distributions through stratified sampling, while the hierarchical hashing method captures relationships at multiple granularities through single-level, multi-level, and multi-view variants. Experiments across 39 datasets show that GAIS achieves reduction rates above 96% while maintaining or improving model performance relative to state-of-the-art IS methods. The findings show that the distance-based mini-batch approach offers an optimal efficiency for large-scale datasets, while multi-view variants excel on complex, high-dimensional data, demonstrating that attention-based importance scoring can effectively identify instances important for maintaining decision boundaries while avoiding computationally prohibitive pairwise comparisons. The code is publicly available at <span><span>https://github.com/zahiriddin-rustamov/gais</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 167-182"},"PeriodicalIF":14.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145219483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Long-term time series forecasting (LTSF) is crucial in modern society, playing a pivotal role in facilitating long-term planning and developing early warning systems. While many Transformer-based models have recently been introduced for LTSF, a doubt has been raised regarding the effectiveness of attention modules in capturing cross-time dependencies. In this study, we design a mask-series experiment to validate this assumption and subsequently propose the ”Cross-variable Linear Integrated ENhanced Transformer for Multivariate Long-Term Time Series Forecasting” (Client), an advanced model that outperforms both traditional Transformer-based models and linear models. Client employs the linear module to learn trend information and the enhanced Transformer module to capture cross-variable dependencies. Meanwhile, the cross-variable Transformer module in Client simplifies the embedding and position encoding layers and replaces the decoder module with a projection layer. Extensive experiments with nine real-world datasets have confirmed the SOTA performance of Client with the least computation time and memory consumption compared with the previous Transformer-based models. Our code is available at https://github.com/daxin007/Client.
{"title":"Client: Cross-variable linear integrated enhanced transformer for multivariate long-term time series forecasting","authors":"Jiaxin Gao , Wenbo Hu , Dongxiao Zhang , Yuntian Chen","doi":"10.1016/j.aiopen.2025.06.001","DOIUrl":"10.1016/j.aiopen.2025.06.001","url":null,"abstract":"<div><div>Long-term time series forecasting (LTSF) is crucial in modern society, playing a pivotal role in facilitating long-term planning and developing early warning systems. While many Transformer-based models have recently been introduced for LTSF, a doubt has been raised regarding the effectiveness of attention modules in capturing cross-time dependencies. In this study, we design a mask-series experiment to validate this assumption and subsequently propose the ”Cross-variable Linear Integrated ENhanced Transformer for Multivariate Long-Term Time Series Forecasting” (<em>Client</em>), an advanced model that outperforms both traditional Transformer-based models and linear models. <em>Client</em> employs the linear module to learn trend information and the enhanced Transformer module to capture cross-variable dependencies. Meanwhile, the cross-variable Transformer module in <em>Client</em> simplifies the embedding and position encoding layers and replaces the decoder module with a projection layer. Extensive experiments with nine real-world datasets have confirmed the SOTA performance of <em>Client</em> with the least computation time and memory consumption compared with the previous Transformer-based models. Our code is available at <span><span>https://github.com/daxin007/Client</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 93-107"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144656936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-09-29DOI: 10.1016/j.aiopen.2025.09.002
Shuang Wu , Daniela M. Romano
Emotion recognition has become increasingly significant in artificial intelligence; however, the impact of body movements on emotion interpretation remains under-explored. This paper presents a novel Hybrid Bayesian Pre-trained Long Short-Term Memory (HBP-LSTM) framework that combines low-level pose data with high-level kinematic features, utilising Bayesian inference to enhance the accuracy and robustness of emotion recognition. The proposed model is trained on high-quality laboratory data to capture the fundamental patterns of emotional expression through body movements. We introduce noise and employ adversarial attack methods such as the Fast Gradient Sign Method (FGSM) to evaluate the model’s robustness during testing. This approach assesses the HBP-LSTM’s ability to maintain performance under data degradation and adversarial conditions, common challenges in real-world scenarios. We validated the HBP-LSTM on two public datasets, EGBM and KDAEE, demonstrating that the model exhibits high robustness against noise and adversarial perturbations, outperforming traditional models. The HBP-LSTM accurately identifies seven basic emotions (happiness, sadness, surprise, fear, anger, disgust, and neutrality) with accuracies of 98% and 88% on the EGBM and KDAEE datasets, respectively. HBP-LSTM is a noise-resistant model with a reliable emotion recognition framework, which lays the foundation for future applications of emotion recognition technology in more challenging real-world environments.
{"title":"Robust emotion recognition using hybrid Bayesian LSTM based on Laban movement analysis","authors":"Shuang Wu , Daniela M. Romano","doi":"10.1016/j.aiopen.2025.09.002","DOIUrl":"10.1016/j.aiopen.2025.09.002","url":null,"abstract":"<div><div>Emotion recognition has become increasingly significant in artificial intelligence; however, the impact of body movements on emotion interpretation remains under-explored. This paper presents a novel Hybrid Bayesian Pre-trained Long Short-Term Memory (HBP-LSTM) framework that combines low-level pose data with high-level kinematic features, utilising Bayesian inference to enhance the accuracy and robustness of emotion recognition. The proposed model is trained on high-quality laboratory data to capture the fundamental patterns of emotional expression through body movements. We introduce noise and employ adversarial attack methods such as the Fast Gradient Sign Method (FGSM) to evaluate the model’s robustness during testing. This approach assesses the HBP-LSTM’s ability to maintain performance under data degradation and adversarial conditions, common challenges in real-world scenarios. We validated the HBP-LSTM on two public datasets, EGBM and KDAEE, demonstrating that the model exhibits high robustness against noise and adversarial perturbations, outperforming traditional models. The HBP-LSTM accurately identifies seven basic emotions (happiness, sadness, surprise, fear, anger, disgust, and neutrality) with accuracies of 98% and 88% on the EGBM and KDAEE datasets, respectively. HBP-LSTM is a noise-resistant model with a reliable emotion recognition framework, which lays the foundation for future applications of emotion recognition technology in more challenging real-world environments.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 183-203"},"PeriodicalIF":14.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145218905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2024-01-06DOI: 10.1016/j.aiopen.2024.01.001
{"title":"Erratum regarding Declaration of Competing Interest statements in previously published articles","authors":"","doi":"10.1016/j.aiopen.2024.01.001","DOIUrl":"10.1016/j.aiopen.2024.01.001","url":null,"abstract":"","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 329-330"},"PeriodicalIF":14.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139394934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}