首页 > 最新文献

AI Open最新文献

英文 中文
Scalable graph attention-based instance selection via mini-batch sampling and hierarchical hashing 通过小批量采样和分层哈希进行可扩展的基于关注的图实例选择
IF 14.8 Pub Date : 2025-01-01 DOI: 10.1016/j.aiopen.2025.08.004
Zahiriddin Rustamov , Ayham Zaitouny , Nazar Zaki
Instance selection (IS) addresses the critical challenge of reducing dataset size while keeping informative characteristics, becoming increasingly important as datasets grow to millions of instances. Current IS methods often struggle with capturing complex relationships in high-dimensional spaces and scale with large datasets. This paper introduces a graph attention-based instance selection (GAIS) method that uses attention mechanisms to identify informative instances through their structural relationships in graph representations. We present two approaches for scalable graph construction: a distance-based mini-batch sampling technique that achieves dataset-size-independent complexity through strategic batch processing, and a hierarchical hashing approach that enables efficient similarity computation through random projections. The mini-batch approach keeps class distributions through stratified sampling, while the hierarchical hashing method captures relationships at multiple granularities through single-level, multi-level, and multi-view variants. Experiments across 39 datasets show that GAIS achieves reduction rates above 96% while maintaining or improving model performance relative to state-of-the-art IS methods. The findings show that the distance-based mini-batch approach offers an optimal efficiency for large-scale datasets, while multi-view variants excel on complex, high-dimensional data, demonstrating that attention-based importance scoring can effectively identify instances important for maintaining decision boundaries while avoiding computationally prohibitive pairwise comparisons. The code is publicly available at https://github.com/zahiriddin-rustamov/gais.
实例选择(IS)解决了在保持信息特征的同时减少数据集大小的关键挑战,随着数据集增长到数百万个实例,它变得越来越重要。当前的IS方法通常难以在高维空间中捕获复杂的关系,并且难以在大型数据集上进行扩展。本文介绍了一种基于图注意的实例选择(GAIS)方法,该方法利用注意机制通过图表示中的结构关系来识别信息实例。我们提出了两种可扩展图构建方法:一种基于距离的小批量采样技术,通过战略性批处理实现与数据集大小无关的复杂性,以及一种分层哈希方法,通过随机投影实现高效的相似性计算。小批处理方法通过分层抽样保持类分布,而分层散列方法通过单级、多级和多视图变体捕获多粒度的关系。39个数据集的实验表明,相对于最先进的IS方法,GAIS在保持或提高模型性能的同时,实现了96%以上的降噪率。研究结果表明,基于距离的小批量方法为大规模数据集提供了最佳效率,而多视图变体在复杂的高维数据上表现出色,这表明基于注意力的重要性评分可以有效地识别对维持决策边界重要的实例,同时避免了计算上的两两比较。该代码可在https://github.com/zahiriddin-rustamov/gais上公开获得。
{"title":"Scalable graph attention-based instance selection via mini-batch sampling and hierarchical hashing","authors":"Zahiriddin Rustamov ,&nbsp;Ayham Zaitouny ,&nbsp;Nazar Zaki","doi":"10.1016/j.aiopen.2025.08.004","DOIUrl":"10.1016/j.aiopen.2025.08.004","url":null,"abstract":"<div><div>Instance selection (IS) addresses the critical challenge of reducing dataset size while keeping informative characteristics, becoming increasingly important as datasets grow to millions of instances. Current IS methods often struggle with capturing complex relationships in high-dimensional spaces and scale with large datasets. This paper introduces a graph attention-based instance selection (GAIS) method that uses attention mechanisms to identify informative instances through their structural relationships in graph representations. We present two approaches for scalable graph construction: a distance-based mini-batch sampling technique that achieves dataset-size-independent complexity through strategic batch processing, and a hierarchical hashing approach that enables efficient similarity computation through random projections. The mini-batch approach keeps class distributions through stratified sampling, while the hierarchical hashing method captures relationships at multiple granularities through single-level, multi-level, and multi-view variants. Experiments across 39 datasets show that GAIS achieves reduction rates above 96% while maintaining or improving model performance relative to state-of-the-art IS methods. The findings show that the distance-based mini-batch approach offers an optimal efficiency for large-scale datasets, while multi-view variants excel on complex, high-dimensional data, demonstrating that attention-based importance scoring can effectively identify instances important for maintaining decision boundaries while avoiding computationally prohibitive pairwise comparisons. The code is publicly available at <span><span>https://github.com/zahiriddin-rustamov/gais</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 167-182"},"PeriodicalIF":14.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145219483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Client: Cross-variable linear integrated enhanced transformer for multivariate long-term time series forecasting 客户:用于多变量长期时间序列预测的交叉变量线性集成增强变压器
Pub Date : 2025-01-01 DOI: 10.1016/j.aiopen.2025.06.001
Jiaxin Gao , Wenbo Hu , Dongxiao Zhang , Yuntian Chen
Long-term time series forecasting (LTSF) is crucial in modern society, playing a pivotal role in facilitating long-term planning and developing early warning systems. While many Transformer-based models have recently been introduced for LTSF, a doubt has been raised regarding the effectiveness of attention modules in capturing cross-time dependencies. In this study, we design a mask-series experiment to validate this assumption and subsequently propose the ”Cross-variable Linear Integrated ENhanced Transformer for Multivariate Long-Term Time Series Forecasting” (Client), an advanced model that outperforms both traditional Transformer-based models and linear models. Client employs the linear module to learn trend information and the enhanced Transformer module to capture cross-variable dependencies. Meanwhile, the cross-variable Transformer module in Client simplifies the embedding and position encoding layers and replaces the decoder module with a projection layer. Extensive experiments with nine real-world datasets have confirmed the SOTA performance of Client with the least computation time and memory consumption compared with the previous Transformer-based models. Our code is available at https://github.com/daxin007/Client.
长期时间序列预测(LTSF)在现代社会中至关重要,在促进长期规划和建立预警系统方面发挥着关键作用。虽然最近为LTSF引入了许多基于transformer的模型,但是对于注意力模块在捕获跨时间依赖性方面的有效性提出了疑问。在本研究中,我们设计了一个掩模系列实验来验证这一假设,并随后提出了“用于多元长期时间序列预测的交叉变量线性集成增强型变压器”(Client),这是一个优于传统基于变压器的模型和线性模型的先进模型。客户端使用线性模块来学习趋势信息,使用增强的Transformer模块来捕获跨变量依赖关系。同时,Client中的跨变量Transformer模块简化了嵌入层和位置编码层,并将解码器模块替换为投影层。在9个真实数据集上的大量实验证明,与之前基于transformer的模型相比,Client的SOTA性能具有最小的计算时间和内存消耗。我们的代码可在https://github.com/daxin007/Client上获得。
{"title":"Client: Cross-variable linear integrated enhanced transformer for multivariate long-term time series forecasting","authors":"Jiaxin Gao ,&nbsp;Wenbo Hu ,&nbsp;Dongxiao Zhang ,&nbsp;Yuntian Chen","doi":"10.1016/j.aiopen.2025.06.001","DOIUrl":"10.1016/j.aiopen.2025.06.001","url":null,"abstract":"<div><div>Long-term time series forecasting (LTSF) is crucial in modern society, playing a pivotal role in facilitating long-term planning and developing early warning systems. While many Transformer-based models have recently been introduced for LTSF, a doubt has been raised regarding the effectiveness of attention modules in capturing cross-time dependencies. In this study, we design a mask-series experiment to validate this assumption and subsequently propose the ”Cross-variable Linear Integrated ENhanced Transformer for Multivariate Long-Term Time Series Forecasting” (<em>Client</em>), an advanced model that outperforms both traditional Transformer-based models and linear models. <em>Client</em> employs the linear module to learn trend information and the enhanced Transformer module to capture cross-variable dependencies. Meanwhile, the cross-variable Transformer module in <em>Client</em> simplifies the embedding and position encoding layers and replaces the decoder module with a projection layer. Extensive experiments with nine real-world datasets have confirmed the SOTA performance of <em>Client</em> with the least computation time and memory consumption compared with the previous Transformer-based models. Our code is available at <span><span>https://github.com/daxin007/Client</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 93-107"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144656936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust emotion recognition using hybrid Bayesian LSTM based on Laban movement analysis 基于Laban动作分析的混合贝叶斯LSTM鲁棒情绪识别
IF 14.8 Pub Date : 2025-01-01 DOI: 10.1016/j.aiopen.2025.09.002
Shuang Wu , Daniela M. Romano
Emotion recognition has become increasingly significant in artificial intelligence; however, the impact of body movements on emotion interpretation remains under-explored. This paper presents a novel Hybrid Bayesian Pre-trained Long Short-Term Memory (HBP-LSTM) framework that combines low-level pose data with high-level kinematic features, utilising Bayesian inference to enhance the accuracy and robustness of emotion recognition. The proposed model is trained on high-quality laboratory data to capture the fundamental patterns of emotional expression through body movements. We introduce noise and employ adversarial attack methods such as the Fast Gradient Sign Method (FGSM) to evaluate the model’s robustness during testing. This approach assesses the HBP-LSTM’s ability to maintain performance under data degradation and adversarial conditions, common challenges in real-world scenarios. We validated the HBP-LSTM on two public datasets, EGBM and KDAEE, demonstrating that the model exhibits high robustness against noise and adversarial perturbations, outperforming traditional models. The HBP-LSTM accurately identifies seven basic emotions (happiness, sadness, surprise, fear, anger, disgust, and neutrality) with accuracies of 98% and 88% on the EGBM and KDAEE datasets, respectively. HBP-LSTM is a noise-resistant model with a reliable emotion recognition framework, which lays the foundation for future applications of emotion recognition technology in more challenging real-world environments.
情感识别在人工智能中变得越来越重要;然而,身体动作对情绪解释的影响仍未得到充分探讨。本文提出了一种新的混合贝叶斯预训练长短期记忆(HBP-LSTM)框架,该框架将低级姿态数据与高级运动特征相结合,利用贝叶斯推理来提高情绪识别的准确性和鲁棒性。所提出的模型是在高质量的实验室数据上训练的,以捕捉通过身体动作表达情感的基本模式。在测试过程中,我们引入噪声并采用对抗攻击方法(如快速梯度符号法(FGSM))来评估模型的鲁棒性。该方法评估了HBP-LSTM在数据退化和对抗条件下保持性能的能力,这是现实场景中的常见挑战。我们在两个公共数据集(EGBM和KDAEE)上验证了HBP-LSTM,结果表明该模型对噪声和对抗性扰动具有很高的鲁棒性,优于传统模型。HBP-LSTM准确识别七种基本情绪(快乐、悲伤、惊讶、恐惧、愤怒、厌恶和中立),在EGBM和KDAEE数据集上的准确率分别为98%和88%。HBP-LSTM是一种具有可靠情绪识别框架的抗噪声模型,为未来情绪识别技术在更具挑战性的现实环境中的应用奠定了基础。
{"title":"Robust emotion recognition using hybrid Bayesian LSTM based on Laban movement analysis","authors":"Shuang Wu ,&nbsp;Daniela M. Romano","doi":"10.1016/j.aiopen.2025.09.002","DOIUrl":"10.1016/j.aiopen.2025.09.002","url":null,"abstract":"<div><div>Emotion recognition has become increasingly significant in artificial intelligence; however, the impact of body movements on emotion interpretation remains under-explored. This paper presents a novel Hybrid Bayesian Pre-trained Long Short-Term Memory (HBP-LSTM) framework that combines low-level pose data with high-level kinematic features, utilising Bayesian inference to enhance the accuracy and robustness of emotion recognition. The proposed model is trained on high-quality laboratory data to capture the fundamental patterns of emotional expression through body movements. We introduce noise and employ adversarial attack methods such as the Fast Gradient Sign Method (FGSM) to evaluate the model’s robustness during testing. This approach assesses the HBP-LSTM’s ability to maintain performance under data degradation and adversarial conditions, common challenges in real-world scenarios. We validated the HBP-LSTM on two public datasets, EGBM and KDAEE, demonstrating that the model exhibits high robustness against noise and adversarial perturbations, outperforming traditional models. The HBP-LSTM accurately identifies seven basic emotions (happiness, sadness, surprise, fear, anger, disgust, and neutrality) with accuracies of 98% and 88% on the EGBM and KDAEE datasets, respectively. HBP-LSTM is a noise-resistant model with a reliable emotion recognition framework, which lays the foundation for future applications of emotion recognition technology in more challenging real-world environments.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 183-203"},"PeriodicalIF":14.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145218905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-spatial Semantic Information Aggregation Network for 3D Human Motion Prediction 三维人体运动预测的多空间语义信息聚合网络
IF 14.8 Pub Date : 2025-01-01 DOI: 10.1016/j.aiopen.2025.08.002
Dong He , Jianqi Zhong , Jianhua Ji , Wenming Cao
In recent years, GCN-based methods have achieved great success in skeleton-based human motion prediction tasks due to the human body graph structure. However, existing methods leveraged single semantic information to model the whole motion sequence, which cannot fully exploit the motion dependencies. To tackle this issue, we propose a Multi-spatial Semantic Information Aggregation Network(MSIAN) to enrich the semantic information by focusing on the local spatial structure of the human skeleton. MSIAN includes the Graph-based Feature Extraction and Aggregation Block (GFEAB), where the Integration Graph combines local and global attention to extract spatial features, the Gravity-Centered Graph (GCG) captures the state of each joint by treating the central joint of the skeleton as the center of gravity, and the Spatial Position Graph (SPG) fully utilizes the original joint positions to analyze movements. Extensive experiments show that our proposed MSIAN outperforms the current state-of-the-art methods on Human3.6M, 3DPW, and AMASS datasets. Our code is available at https://github.com/HDdong-hub/MSIAN.
近年来,基于遗传神经网络的方法在基于骨骼的人体运动预测任务中取得了巨大的成功。然而,现有的方法利用单一的语义信息来建模整个运动序列,不能充分利用运动依赖关系。为了解决这个问题,我们提出了一个多空间语义信息聚合网络(MSIAN),通过关注人体骨骼的局部空间结构来丰富语义信息。MSIAN包括基于图的特征提取和聚集块(GFEAB),其中集成图结合局部和全局注意力提取空间特征,重力中心图(GCG)以骨架的中心关节为重心捕获每个关节的状态,空间位置图(SPG)充分利用原始关节位置分析运动。大量的实验表明,我们提出的MSIAN在Human3.6M, 3DPW和AMASS数据集上优于当前最先进的方法。我们的代码可在https://github.com/HDdong-hub/MSIAN上获得。
{"title":"Multi-spatial Semantic Information Aggregation Network for 3D Human Motion Prediction","authors":"Dong He ,&nbsp;Jianqi Zhong ,&nbsp;Jianhua Ji ,&nbsp;Wenming Cao","doi":"10.1016/j.aiopen.2025.08.002","DOIUrl":"10.1016/j.aiopen.2025.08.002","url":null,"abstract":"<div><div>In recent years, GCN-based methods have achieved great success in skeleton-based human motion prediction tasks due to the human body graph structure. However, existing methods leveraged single semantic information to model the whole motion sequence, which cannot fully exploit the motion dependencies. To tackle this issue, we propose a Multi-spatial Semantic Information Aggregation Network(MSIAN) to enrich the semantic information by focusing on the local spatial structure of the human skeleton. MSIAN includes the Graph-based Feature Extraction and Aggregation Block (GFEAB), where the Integration Graph combines local and global attention to extract spatial features, the Gravity-Centered Graph (GCG) captures the state of each joint by treating the central joint of the skeleton as the center of gravity, and the Spatial Position Graph (SPG) fully utilizes the original joint positions to analyze movements. Extensive experiments show that our proposed MSIAN outperforms the current state-of-the-art methods on Human3.6M, 3DPW, and AMASS datasets. Our code is available at <span><span>https://github.com/HDdong-hub/MSIAN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 155-166"},"PeriodicalIF":14.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145157277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Erratum regarding Declaration of Competing Interest statements in previously published articles 关于以前发表的文章中竞争利益声明的勘误
IF 14.8 Pub Date : 2025-01-01 DOI: 10.1016/j.aiopen.2024.01.001
{"title":"Erratum regarding Declaration of Competing Interest statements in previously published articles","authors":"","doi":"10.1016/j.aiopen.2024.01.001","DOIUrl":"10.1016/j.aiopen.2024.01.001","url":null,"abstract":"","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 329-330"},"PeriodicalIF":14.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139394934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection 使用图像、语音和文本的医学诊断中深度学习的多模态奇迹:COVID-19检测的全面回顾
Pub Date : 2025-01-01 DOI: 10.1016/j.aiopen.2025.01.003
Md Shofiqul Islam , Khondokar Fida Hasan , Hasibul Hossain Shajeeb , Humayan Kabir Rana , Md. Saifur Rahman , Md. Munirul Hasan , AKM Azad , Ibrahim Abdullah , Mohammad Ali Moni
This study presents a comprehensive review of the potential of multimodal deep learning (DL) in medical diagnosis, using COVID-19 as a case example. Motivated by the success of artificial intelligence applications during the COVID-19 pandemic, this research aims to uncover the capabilities of DL in disease screening, prediction, and classification, and to derive insights that enhance the resilience, sustainability, and inclusiveness of science, technology, and innovation systems. Adopting a systematic approach, we investigate the fundamental methodologies, data sources, preprocessing steps, and challenges encountered in various studies and implementations. We explore the architecture of deep learning models, emphasising their data-specific structures and underlying algorithms. Subsequently, we compare different deep learning strategies utilised in COVID-19 analysis, evaluating them based on methodology, data, performance, and prerequisites for future research. By examining diverse data types and diagnostic modalities, this research contributes to scientific understanding and knowledge of the multimodal application of DL and its effectiveness in diagnosis. We have implemented and analysed 11 deep learning models using COVID-19 image, text, and speech (ie, cough) data. Our analysis revealed that the MobileNet model achieved the highest accuracy of 99.97% for COVID-19 image data and 93.73% for speech data (i.e., cough). However, the BiGRU model demonstrated superior performance in COVID-19 text classification with an accuracy of 99.89%. The broader implications of this research suggest potential benefits for other domains and disciplines that could leverage deep learning techniques for image, text, and speech analysis.
本研究以COVID-19为例,全面回顾了多模态深度学习(DL)在医学诊断中的潜力。受2019冠状病毒病大流行期间人工智能应用的成功启发,本研究旨在揭示深度学习在疾病筛查、预测和分类方面的能力,并从中获得增强科学、技术和创新系统的弹性、可持续性和包容性的见解。采用系统的方法,我们研究了各种研究和实现中遇到的基本方法,数据源,预处理步骤和挑战。我们探讨了深度学习模型的架构,强调了它们的数据特定结构和底层算法。随后,我们比较了COVID-19分析中使用的不同深度学习策略,并根据方法、数据、性能和未来研究的先决条件对其进行评估。通过研究不同的数据类型和诊断模式,本研究有助于科学地理解和认识DL的多模式应用及其在诊断中的有效性。我们使用COVID-19图像、文本和语音(如咳嗽)数据实现并分析了11个深度学习模型。我们的分析显示,MobileNet模型在COVID-19图像数据和语音数据(如咳嗽)上的准确率最高,分别为99.97%和93.73%。然而,BiGRU模型在COVID-19文本分类中表现出优异的性能,准确率达到99.89%。这项研究的广泛影响表明,其他领域和学科可以利用深度学习技术进行图像、文本和语音分析。
{"title":"Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection","authors":"Md Shofiqul Islam ,&nbsp;Khondokar Fida Hasan ,&nbsp;Hasibul Hossain Shajeeb ,&nbsp;Humayan Kabir Rana ,&nbsp;Md. Saifur Rahman ,&nbsp;Md. Munirul Hasan ,&nbsp;AKM Azad ,&nbsp;Ibrahim Abdullah ,&nbsp;Mohammad Ali Moni","doi":"10.1016/j.aiopen.2025.01.003","DOIUrl":"10.1016/j.aiopen.2025.01.003","url":null,"abstract":"<div><div>This study presents a comprehensive review of the potential of multimodal deep learning (DL) in medical diagnosis, using COVID-19 as a case example. Motivated by the success of artificial intelligence applications during the COVID-19 pandemic, this research aims to uncover the capabilities of DL in disease screening, prediction, and classification, and to derive insights that enhance the resilience, sustainability, and inclusiveness of science, technology, and innovation systems. Adopting a systematic approach, we investigate the fundamental methodologies, data sources, preprocessing steps, and challenges encountered in various studies and implementations. We explore the architecture of deep learning models, emphasising their data-specific structures and underlying algorithms. Subsequently, we compare different deep learning strategies utilised in COVID-19 analysis, evaluating them based on methodology, data, performance, and prerequisites for future research. By examining diverse data types and diagnostic modalities, this research contributes to scientific understanding and knowledge of the multimodal application of DL and its effectiveness in diagnosis. We have implemented and analysed 11 deep learning models using COVID-19 image, text, and speech (ie, cough) data. Our analysis revealed that the MobileNet model achieved the highest accuracy of 99.97% for COVID-19 image data and 93.73% for speech data (i.e., cough). However, the BiGRU model demonstrated superior performance in COVID-19 text classification with an accuracy of 99.89%. The broader implications of this research suggest potential benefits for other domains and disciplines that could leverage deep learning techniques for image, text, and speech analysis.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 12-44"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143134048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computer audition for healthcare: A survey on speech analysis 医疗保健用计算机听力:语音分析的研究进展
IF 14.8 Pub Date : 2025-01-01 DOI: 10.1016/j.aiopen.2025.10.001
Kun Qian , Zhonghao Zhao , Yang Tan , Weijia Zhang , MinKi Cho , Cuiping Zhu , Fuze Tian , Bin Hu , Yoshiharu Yamamoto , Björn W. Schuller
Intelligent speech analysis (ISA) constitutes a significant component within the realm of computer audition (CA) technology. Speech, as a fundamental tool for human communication, not only conveys rich semantic information but also holds significant potential for various healthcare applications. Computational paralinguistics methods can be used to analyse alterations in the acoustic characteristics of speech signals induced by medical conditions, providing valuable insights into shifts in an individual’s health status. More importantly, compared to other physiological monitoring devices, speech acquisition devices are non-invasive and user-friendly, making them accessible for a wide range of individuals. However, despite its promise, ISA in healthcare currently faces a range of notable challenges that hinder its widespread adoption. In this survey, we present an overview of the development and current research in speech analysis technologies within the healthcare domain. First, we summarise the methodologies employed in ISA-based healthcare. Next, we provide an overview of applications in evaluating physical diseases, mental health conditions, and neurological disorders. Additionally, we discuss key limitations and shortcomings in the current state of the field. Finally, we conclude with a summary of the discussed works and offer insights into future research directions aimed at addressing these limitations to advance the practical implementation of ISA in clinical settings. This survey aims to serve as a valuable resource for researchers in speech analysis, biomedicine, and related fields. We hope to inspire greater interest in this promising area within the scientific community and provide guidance for future studies in this evolving field.
智能语音分析(ISA)是计算机听觉(CA)技术领域的重要组成部分。语音作为人类交流的基本工具,不仅可以传递丰富的语义信息,而且在各种医疗保健应用中具有巨大的潜力。计算副语言学方法可用于分析由医疗条件引起的语音信号声学特征的变化,为个人健康状况的变化提供有价值的见解。更重要的是,与其他生理监测设备相比,语音采集设备具有非侵入性和用户友好性,使其适用于广泛的个体。然而,尽管有其前景,ISA在医疗保健领域目前面临着一系列显著的挑战,阻碍了其广泛采用。在本调查中,我们概述了医疗保健领域语音分析技术的发展和当前研究。首先,我们总结了基于isa的医疗保健中使用的方法。接下来,我们概述了在评估身体疾病、精神健康状况和神经疾病方面的应用。此外,我们还讨论了该领域当前状态的主要限制和缺点。最后,我们总结了所讨论的工作,并对未来的研究方向提出了见解,旨在解决这些限制,以促进ISA在临床环境中的实际实施。本调查旨在为语音分析、生物医学及相关领域的研究人员提供有价值的资源。我们希望在科学界激发对这一有前途的领域的更大兴趣,并为这一不断发展的领域的未来研究提供指导。
{"title":"Computer audition for healthcare: A survey on speech analysis","authors":"Kun Qian ,&nbsp;Zhonghao Zhao ,&nbsp;Yang Tan ,&nbsp;Weijia Zhang ,&nbsp;MinKi Cho ,&nbsp;Cuiping Zhu ,&nbsp;Fuze Tian ,&nbsp;Bin Hu ,&nbsp;Yoshiharu Yamamoto ,&nbsp;Björn W. Schuller","doi":"10.1016/j.aiopen.2025.10.001","DOIUrl":"10.1016/j.aiopen.2025.10.001","url":null,"abstract":"<div><div>Intelligent speech analysis (ISA) constitutes a significant component within the realm of computer audition (CA) technology. Speech, as a fundamental tool for human communication, not only conveys rich semantic information but also holds significant potential for various healthcare applications. Computational paralinguistics methods can be used to analyse alterations in the acoustic characteristics of speech signals induced by medical conditions, providing valuable insights into shifts in an individual’s health status. More importantly, compared to other physiological monitoring devices, speech acquisition devices are non-invasive and user-friendly, making them accessible for a wide range of individuals. However, despite its promise, ISA in healthcare currently faces a range of notable challenges that hinder its widespread adoption. In this survey, we present an overview of the development and current research in speech analysis technologies within the healthcare domain. First, we summarise the methodologies employed in ISA-based healthcare. Next, we provide an overview of applications in evaluating physical diseases, mental health conditions, and neurological disorders. Additionally, we discuss key limitations and shortcomings in the current state of the field. Finally, we conclude with a summary of the discussed works and offer insights into future research directions aimed at addressing these limitations to advance the practical implementation of ISA in clinical settings. This survey aims to serve as a valuable resource for researchers in speech analysis, biomedicine, and related fields. We hope to inspire greater interest in this promising area within the scientific community and provide guidance for future studies in this evolving field.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 244-275"},"PeriodicalIF":14.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145415042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From tools to partners: How large language models are transforming urban planning 从工具到合作伙伴:大型语言模型如何改变城市规划
IF 14.8 Pub Date : 2025-01-01 DOI: 10.1016/j.aiopen.2025.11.001
Fangyong Pan , Xinyi Huang , Yuxi Bi , Yunfan Gao , Yu Ye , Haofen Wang
Recent advances in large language models have transformed urban planning from passive tool-assisted workflows to active human–AI collaborative partnerships, enabling natural language-driven design generation, multi-agent stakeholder simulation, and intelligent decision support. This survey systematically examines the integration of LLMs in urban planning, establishing a comprehensive taxonomy covering task categories, technical paradigms, and collaboration patterns. Furthermore, the survey identifies critical evaluation frameworks and benchmark datasets while examining implementation challenges, including domain knowledge integration, scalability constraints, and ethical implications. The work bridges theoretical advances with practical deployment considerations, providing guidance for selecting appropriate LLM approaches across different urban planning contexts and scales.
大型语言模型的最新进展已将城市规划从被动的工具辅助工作流程转变为主动的人类-人工智能协作伙伴关系,从而实现自然语言驱动的设计生成、多代理利益相关者模拟和智能决策支持。本调查系统地考察了法学硕士在城市规划中的整合,建立了一个涵盖任务类别、技术范例和合作模式的综合分类。此外,该调查还确定了关键的评估框架和基准数据集,同时检查了实施挑战,包括领域知识集成、可扩展性限制和伦理影响。这项工作将理论进步与实际部署考虑联系起来,为在不同的城市规划背景和规模中选择合适的法学硕士方法提供指导。
{"title":"From tools to partners: How large language models are transforming urban planning","authors":"Fangyong Pan ,&nbsp;Xinyi Huang ,&nbsp;Yuxi Bi ,&nbsp;Yunfan Gao ,&nbsp;Yu Ye ,&nbsp;Haofen Wang","doi":"10.1016/j.aiopen.2025.11.001","DOIUrl":"10.1016/j.aiopen.2025.11.001","url":null,"abstract":"<div><div>Recent advances in large language models have transformed urban planning from passive tool-assisted workflows to active human–AI collaborative partnerships, enabling natural language-driven design generation, multi-agent stakeholder simulation, and intelligent decision support. This survey systematically examines the integration of LLMs in urban planning, establishing a comprehensive taxonomy covering task categories, technical paradigms, and collaboration patterns. Furthermore, the survey identifies critical evaluation frameworks and benchmark datasets while examining implementation challenges, including domain knowledge integration, scalability constraints, and ethical implications. The work bridges theoretical advances with practical deployment considerations, providing guidance for selecting appropriate LLM approaches across different urban planning contexts and scales.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 276-298"},"PeriodicalIF":14.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145623235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum regarding updated Declaration of Competing Interest statement in previously published articles 关于先前发表的文章中更新的竞争利益声明的勘误表
IF 14.8 Pub Date : 2025-01-01 DOI: 10.1016/j.aiopen.2025.12.001
{"title":"Corrigendum regarding updated Declaration of Competing Interest statement in previously published articles","authors":"","doi":"10.1016/j.aiopen.2025.12.001","DOIUrl":"10.1016/j.aiopen.2025.12.001","url":null,"abstract":"","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Page 333"},"PeriodicalIF":14.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ADAP: Adaptive & Dynamic Arc Padding for predicting seam profiles in Multi-Layer-Multi-Pass robotic welding ADAP:用于多层多道次机器人焊接焊缝轮廓预测的自适应动态电弧填充
IF 14.8 Pub Date : 2025-01-01 DOI: 10.1016/j.aiopen.2025.10.003
He Wang , Sen Li , Xiaobo Liu , Chengxiao Dong , Fang Wan
Welding thick metal plates using Multi-Layer-Multi-Pass (MLMP) techniques demands precise control over the weld seam profile as it evolves during the cooling process. In MLMP welding, typically executed with Gas Metal Arc Welding (GMAW) and shielding gas protection, the continuous deposition of weld beads results in dynamic changes to the seam geometry. These challenging traditional robotic welding systems rely on static models. To ensure high-quality joints, real-time adaptation of welding paths requires accurate predictions of weld bead geometry, which in turn guide the estimation of welding positions for adaptive trajectory planning. In this study, we introduce the Adaptive & Dynamic Arc Padding (ADAP) framework. This novel data-driven approach integrates deep learning with an innovative arc-based representation of weld bead profiles. By representing the weld bead geometry through image-derived boundaries and primitive arc parameters (arc center and radius), ADAP establishes a direct link between welding parameters and the evolving weld seam profile. Utilizing datasets generated from Flow-3D simulations of the MLMP process, our framework achieves high-accuracy, real-time predictions: welding positions are estimated within 0.025 s (with an average error of approximately 1.5 mm), and weld seam profiles are predicted in 15 ms, with the arc-based geometric parameters accurately estimated (average errors of 0.73 mm in arc center position and 0.66 mm in radius). This practical approach enhances the efficiency and quality of MLMP robotic welding and contributes to advances in data-driven modeling and intelligent control in manufacturing, paving the way for autonomous welding systems.
采用多层多道次(MLMP)技术焊接厚金属板需要精确控制焊缝轮廓,因为它在冷却过程中的演变。在MLMP焊接中,通常采用气体金属弧焊(GMAW)和保护气体保护,焊接珠的连续沉积导致焊缝几何形状的动态变化。这些具有挑战性的传统机器人焊接系统依赖于静态模型。为了确保高质量的接头,焊接路径的实时自适应需要准确预测焊缝几何形状,从而指导自适应轨迹规划中焊接位置的估计。在本研究中,我们介绍了自适应动态弧线填充(ADAP)框架。这种新颖的数据驱动方法将深度学习与基于弧线的焊缝轮廓表示相结合。通过图像派生的边界和原始电弧参数(电弧中心和半径)表示焊缝几何形状,ADAP在焊接参数和不断变化的焊缝轮廓之间建立了直接联系。利用MLMP过程Flow-3D模拟生成的数据集,我们的框架实现了高精度的实时预测:焊接位置在0.025 s内估计(平均误差约为1.5 mm),焊缝轮廓在15 ms内预测,基于电弧的几何参数准确估计(电弧中心位置平均误差为0.73 mm,半径平均误差为0.66 mm)。这种实用的方法提高了MLMP机器人焊接的效率和质量,有助于在制造业中数据驱动建模和智能控制的进步,为自主焊接系统铺平了道路。
{"title":"ADAP: Adaptive & Dynamic Arc Padding for predicting seam profiles in Multi-Layer-Multi-Pass robotic welding","authors":"He Wang ,&nbsp;Sen Li ,&nbsp;Xiaobo Liu ,&nbsp;Chengxiao Dong ,&nbsp;Fang Wan","doi":"10.1016/j.aiopen.2025.10.003","DOIUrl":"10.1016/j.aiopen.2025.10.003","url":null,"abstract":"<div><div>Welding thick metal plates using Multi-Layer-Multi-Pass (MLMP) techniques demands precise control over the weld seam profile as it evolves during the cooling process. In MLMP welding, typically executed with Gas Metal Arc Welding (GMAW) and shielding gas protection, the continuous deposition of weld beads results in dynamic changes to the seam geometry. These challenging traditional robotic welding systems rely on static models. To ensure high-quality joints, real-time adaptation of welding paths requires accurate predictions of weld bead geometry, which in turn guide the estimation of welding positions for adaptive trajectory planning. In this study, we introduce the Adaptive &amp; Dynamic Arc Padding (ADAP) framework. This novel data-driven approach integrates deep learning with an innovative arc-based representation of weld bead profiles. By representing the weld bead geometry through image-derived boundaries and primitive arc parameters (arc center and radius), ADAP establishes a direct link between welding parameters and the evolving weld seam profile. Utilizing datasets generated from Flow-3D simulations of the MLMP process, our framework achieves high-accuracy, real-time predictions: welding positions are estimated within 0.025 s (with an average error of approximately 1.5 mm), and weld seam profiles are predicted in 15 ms, with the arc-based geometric parameters accurately estimated (average errors of 0.73 mm in arc center position and 0.66 mm in radius). This practical approach enhances the efficiency and quality of MLMP robotic welding and contributes to advances in data-driven modeling and intelligent control in manufacturing, paving the way for autonomous welding systems.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 204-219"},"PeriodicalIF":14.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145361833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
AI Open
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1