首页 > 最新文献

Engineering Applications of Artificial Intelligence最新文献

英文 中文
A robust and interpretable framework for sports activity recognition based on wearable sensor signals and image representations 基于可穿戴传感器信号和图像表示的体育活动识别鲁棒和可解释框架
IF 8 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-01-12 DOI: 10.1016/j.engappai.2025.113500
Jian Li , Yibo Fan , Junhui Gong , Junyi Chen , Ruoyu Chen , Wenyan Zhang , Yuliang Zhao
Human activity recognition (HAR) using wearable sensors has advanced rapidly, improving the precision of complex movement identification. However, existing methods rely on single-modal time-series features, limiting spatiotemporal representation and global dependency capture. This hinders dynamic characterization and reduces recognition accuracy. To overcome these limitations, this paper proposes a novel multimodal fusion deep learning approach to enhance complex action recognition. First, we adopt a multimodal input strategy that integrates time-series data and Gramian Angular Difference Field (GADF) images to comprehensively capture the spatiotemporal characteristics of motion data. Second, we design a dual-stream feature fusion network, where Bidirectional Gated Recurrent Unit (BiGRU) combined with a multi-head self-attention mechanism (MSA) is employed to extract time-series features, while Efficient Channel Attention (ECA) and residual block are utilized to enhance image feature representation, effectively leveraging the complementary information across modalities. Finally, we introduce an interpretable analysis method based on submodule optimization, enabling cross-modal attribution analysis to identify key regions contributing to model decisions for both time-series and image features. Experimental results demonstrate that the proposed method achieves an accuracy of 96.88% in a 16-class sports activity recognition task, significantly outperforming traditional machine learning methods and existing deep learning models. This study provides an effective solution for complex action recognition and lays a technological foundation for real-time motion monitoring in wearable smart devices and broader HAR applications.
基于可穿戴传感器的人体活动识别技术(HAR)发展迅速,提高了复杂运动识别的精度。然而,现有方法依赖于单模态时间序列特征,限制了时空表示和全局依赖性捕获。这阻碍了动态表征并降低了识别精度。为了克服这些限制,本文提出了一种新的多模态融合深度学习方法来增强复杂动作识别。首先,采用时间序列数据和格拉马角差场(GADF)图像相结合的多模态输入策略,全面捕捉运动数据的时空特征。其次,设计了双流特征融合网络,利用双向门控循环单元(BiGRU)结合多头自注意机制(MSA)提取时间序列特征,利用高效通道注意(ECA)和残差块增强图像特征表示,有效地利用了模态间的互补信息。最后,我们引入了一种基于子模块优化的可解释分析方法,使跨模态归因分析能够识别对时间序列和图像特征的模型决策有贡献的关键区域。实验结果表明,该方法在16类体育活动识别任务中准确率达到96.88%,显著优于传统机器学习方法和现有深度学习模型。本研究为复杂动作识别提供了有效的解决方案,为可穿戴智能设备的实时运动监测和更广泛的HAR应用奠定了技术基础。
{"title":"A robust and interpretable framework for sports activity recognition based on wearable sensor signals and image representations","authors":"Jian Li ,&nbsp;Yibo Fan ,&nbsp;Junhui Gong ,&nbsp;Junyi Chen ,&nbsp;Ruoyu Chen ,&nbsp;Wenyan Zhang ,&nbsp;Yuliang Zhao","doi":"10.1016/j.engappai.2025.113500","DOIUrl":"10.1016/j.engappai.2025.113500","url":null,"abstract":"<div><div>Human activity recognition (HAR) using wearable sensors has advanced rapidly, improving the precision of complex movement identification. However, existing methods rely on single-modal time-series features, limiting spatiotemporal representation and global dependency capture. This hinders dynamic characterization and reduces recognition accuracy. To overcome these limitations, this paper proposes a novel multimodal fusion deep learning approach to enhance complex action recognition. First, we adopt a multimodal input strategy that integrates time-series data and Gramian Angular Difference Field (GADF) images to comprehensively capture the spatiotemporal characteristics of motion data. Second, we design a dual-stream feature fusion network, where Bidirectional Gated Recurrent Unit (BiGRU) combined with a multi-head self-attention mechanism (MSA) is employed to extract time-series features, while Efficient Channel Attention (ECA) and residual block are utilized to enhance image feature representation, effectively leveraging the complementary information across modalities. Finally, we introduce an interpretable analysis method based on submodule optimization, enabling cross-modal attribution analysis to identify key regions contributing to model decisions for both time-series and image features. Experimental results demonstrate that the proposed method achieves an accuracy of 96.88% in a 16-class sports activity recognition task, significantly outperforming traditional machine learning methods and existing deep learning models. This study provides an effective solution for complex action recognition and lays a technological foundation for real-time motion monitoring in wearable smart devices and broader HAR applications.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"167 ","pages":"Article 113500"},"PeriodicalIF":8.0,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure-informed neural network for predicting fracture-prone regions in laser-deposited aluminum alloys with pores 基于结构信息的神经网络预测激光沉积多孔铝合金断裂易发区
IF 8 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-01-12 DOI: 10.1016/j.engappai.2026.113754
Qingyao Yuan , Gang Wang , Zhenyu Han , Xianyue Liu , Longcen Ji , Shilin Li
Hydrogen pores are critical microscale defects in laser-deposited aluminum alloys, causing local strain concentration and reducing ductility. To spatially predict this structure–property relationship, this study proposes a neural network model based on structural feature extraction and fusion. Al-Mg-Sc specimens were fabricated by coaxial laser wire directed energy deposition, and micro-computed tomography (μ-CT) and Digital Image Correlation (DIC) experiments were conducted sequentially. Pore structure point clouds and normalized strain maps obtained from the experiments were uniformly segmented into spatially aligned blocks and used as training data. The model integrates PointNet and Convolutional Neural Network (CNN) modules to extract structural features and learn spatial correlations. A composite loss was introduced to capture both the continuous strain distribution and discrete high-strain regions. With limited data, the model achieved a pixel-level Area Under the Curve (AUC) of 0.69 and a custom distance-weighted AUC (dw-AUC) of 0.74, which is weighted by spatial proximity. On a full-scale specimen, the model accurately predicted the high-strain regions, one of which coincided with the actual fracture site. Sensitivity analysis shows that using a segmentation block size of around 300 μm and applying random point cloud dropout helps maintain spatial resolution and improves training performance. This work provides a structure-informed modeling approach for predicting damage-prone regions in defect-containing alloys.
氢孔是激光沉积铝合金中重要的微观缺陷,会引起局部应变集中,降低塑性。为了在空间上预测这种结构属性关系,本研究提出了一种基于结构特征提取和融合的神经网络模型。采用同轴激光线定向能沉积法制备Al-Mg-Sc试样,并依次进行显微计算机断层扫描(μ-CT)和数字图像相关(DIC)实验。将实验得到的孔隙结构点云和归一化应变图均匀分割成空间对齐的块作为训练数据。该模型集成了PointNet和卷积神经网络(CNN)模块,提取结构特征并学习空间相关性。引入复合损耗来捕捉连续应变分布和离散高应变区域。在数据有限的情况下,该模型获得了像素级曲线下面积(AUC)为0.69,自定义距离加权AUC (dw-AUC)为0.74,该AUC由空间接近度加权。在全尺寸试样上,该模型准确预测了高应变区域,其中一个区域与实际断裂位置相吻合。灵敏度分析表明,使用300 μm左右的分割块大小和随机点云dropout可以保持空间分辨率,提高训练性能。这项工作为预测含缺陷合金的损伤易发区域提供了一种结构信息建模方法。
{"title":"Structure-informed neural network for predicting fracture-prone regions in laser-deposited aluminum alloys with pores","authors":"Qingyao Yuan ,&nbsp;Gang Wang ,&nbsp;Zhenyu Han ,&nbsp;Xianyue Liu ,&nbsp;Longcen Ji ,&nbsp;Shilin Li","doi":"10.1016/j.engappai.2026.113754","DOIUrl":"10.1016/j.engappai.2026.113754","url":null,"abstract":"<div><div>Hydrogen pores are critical microscale defects in laser-deposited aluminum alloys, causing local strain concentration and reducing ductility. To spatially predict this structure–property relationship, this study proposes a neural network model based on structural feature extraction and fusion. Al-Mg-Sc specimens were fabricated by coaxial laser wire directed energy deposition, and micro-computed tomography (μ-CT) and Digital Image Correlation (DIC) experiments were conducted sequentially. Pore structure point clouds and normalized strain maps obtained from the experiments were uniformly segmented into spatially aligned blocks and used as training data. The model integrates PointNet and Convolutional Neural Network (CNN) modules to extract structural features and learn spatial correlations. A composite loss was introduced to capture both the continuous strain distribution and discrete high-strain regions. With limited data, the model achieved a pixel-level Area Under the Curve (AUC) of 0.69 and a custom distance-weighted AUC (dw-AUC) of 0.74, which is weighted by spatial proximity. On a full-scale specimen, the model accurately predicted the high-strain regions, one of which coincided with the actual fracture site. Sensitivity analysis shows that using a segmentation block size of around 300 μm and applying random point cloud dropout helps maintain spatial resolution and improves training performance. This work provides a structure-informed modeling approach for predicting damage-prone regions in defect-containing alloys.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"167 ","pages":"Article 113754"},"PeriodicalIF":8.0,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A bi-level neural network scheme for three-dimensional super-resolution elastic wave inversion of high-contrast objects 高对比度物体三维超分辨弹性波反演的双层神经网络方案
IF 8 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-01-12 DOI: 10.1016/j.engappai.2026.113832
Lianmu Chen , Li-Ye Xiao , Mingwei Zhuang , Qing Huo Liu
Elastic wave inversion is a key method for inferring subsurface structures from observational data and has been widely applied across various fields. However, traditional three-dimensional (3-D) elastic wave inversion often involves complex mathematical equations and incurs high computational costs. Additionally, accurately determining the properties of high-contrast materials is challenging due to the highly nonlinear nature of the inversion process. Furthermore, many convolutional neural network (CNN) methods struggle with complex or irregular structures, as they often find it difficult to adapt to varying feature shapes. To overcome these challenges, we introduce deformable convolutions into the field of elastic wave inversion and develop a bi-level neural network scheme (BLNNS) for super-resolution inversion of 3-D elastic waves in high-contrast objects. The proposed scheme comprises (a) a low-resolution feature extraction level (LRFEL), which uses a 3-D residual network combined with deformable convolutions to convert scattered field data into low-resolution inversion images; and (b) a super-resolution enhancement level (SREL), which employs an adapted 3-D U-shaped Network (U-Net) to further optimize and enhance the resolution of the inversion images. Numerical examples demonstrate that the proposed scheme reconstructs irregularly shaped objects more effectively than other neural network methods. Additionally, tests with varying levels of noise indicate that the scheme is equally applicable to high-contrast complex objects in both noise-free and noisy environments.
弹性波反演是利用观测资料推断地下构造的一种重要方法,已广泛应用于各个领域。然而,传统的三维弹性波反演往往涉及复杂的数学方程,计算成本高。此外,由于反演过程的高度非线性,准确地确定高对比度材料的性质是具有挑战性的。此外,许多卷积神经网络(CNN)方法难以处理复杂或不规则的结构,因为它们往往难以适应不同的特征形状。为了克服这些挑战,我们将可变形卷积引入弹性波反演领域,并开发了一种双级神经网络方案(BLNNS),用于高对比度物体中三维弹性波的超分辨率反演。该方案包括:(a)低分辨率特征提取层(LRFEL),利用三维残差网络结合可变形卷积将分散的野外数据转换为低分辨率反演图像;(b)超分辨率增强层(SREL),采用自适应的三维u型网络(U-Net)进一步优化和增强反演图像的分辨率。数值算例表明,与其他神经网络方法相比,该方法能更有效地重建不规则形状的物体。此外,不同噪声水平的测试表明,该方案同样适用于无噪声和有噪声环境下的高对比度复杂物体。
{"title":"A bi-level neural network scheme for three-dimensional super-resolution elastic wave inversion of high-contrast objects","authors":"Lianmu Chen ,&nbsp;Li-Ye Xiao ,&nbsp;Mingwei Zhuang ,&nbsp;Qing Huo Liu","doi":"10.1016/j.engappai.2026.113832","DOIUrl":"10.1016/j.engappai.2026.113832","url":null,"abstract":"<div><div>Elastic wave inversion is a key method for inferring subsurface structures from observational data and has been widely applied across various fields. However, traditional three-dimensional (3-D) elastic wave inversion often involves complex mathematical equations and incurs high computational costs. Additionally, accurately determining the properties of high-contrast materials is challenging due to the highly nonlinear nature of the inversion process. Furthermore, many convolutional neural network (CNN) methods struggle with complex or irregular structures, as they often find it difficult to adapt to varying feature shapes. To overcome these challenges, we introduce deformable convolutions into the field of elastic wave inversion and develop a bi-level neural network scheme (BLNNS) for super-resolution inversion of 3-D elastic waves in high-contrast objects. The proposed scheme comprises (a) a low-resolution feature extraction level (LRFEL), which uses a 3-D residual network combined with deformable convolutions to convert scattered field data into low-resolution inversion images; and (b) a super-resolution enhancement level (SREL), which employs an adapted 3-D U-shaped Network (U-Net) to further optimize and enhance the resolution of the inversion images. Numerical examples demonstrate that the proposed scheme reconstructs irregularly shaped objects more effectively than other neural network methods. Additionally, tests with varying levels of noise indicate that the scheme is equally applicable to high-contrast complex objects in both noise-free and noisy environments.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"167 ","pages":"Article 113832"},"PeriodicalIF":8.0,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A neural network approach to path-detection and self-driving vehicles using You Only Look Once and one-layer neuro-adaptive control 使用You Only Look Once和单层神经自适应控制的路径检测和自动驾驶车辆的神经网络方法
IF 8 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-01-12 DOI: 10.1016/j.engappai.2025.113633
Sergio López, Miguel Llama, Gibran López
One of the main challenges of modern robotic systems is how to provide robots with the capacity for self-learning and decision-making on their own. Vision systems can lead to new robotic systems capable of completing complicated tasks in complex work environments by improving the precision and performance of robotic systems. In this work, a navigation and obstacle detection system is developed and implemented in simulation and experimentally with different types of sensors and processing units. This system is mainly based on a 4-wheeled omnidirectional mobile robot, a computer vision system, and deep and shallow artificial neural networks. The task of this system is to detect and follow a path, regardless of its shape, using an on-board camera and artificial neural networks; then resources are implemented to detect obstacles and take actions accordingly. To detect the path, two You Only Look Once models are trained, while for object detection the state-of-the-art Tiny You Only Look Once version 2 model is used. Resources such as transfer learning, coordinate projection, and trajectory generation algorithms are also used. In addition, a single-layer neuro-adaptive compensation control based on filtered error is implemented and dedicated for controlling the speed of the wheels of a 4-wheeled omnidirectional mobile robot. The single-layer neural network is used to compensate for the unknown non-linear functions of the mobile robot; the weights of the neural network is estimated online using suitable weight-updating adaptive laws.
如何使机器人具有自主学习和自主决策的能力是现代机器人系统面临的主要挑战之一。视觉系统可以提高机器人系统的精度和性能,从而使新的机器人系统能够在复杂的工作环境中完成复杂的任务。在本工作中,开发了一个导航和障碍物检测系统,并在仿真和实验中实现了不同类型的传感器和处理单元。该系统主要基于四轮全向移动机器人、计算机视觉系统、深度和浅层人工神经网络。该系统的任务是使用车载摄像头和人工神经网络检测和跟踪路径,无论其形状如何;然后使用资源来检测障碍并采取相应的行动。为了检测路径,我们训练了两个You Only Look Once模型,而对于物体检测,我们使用了最先进的Tiny You Only Look Once版本2模型。还使用了迁移学习、坐标投影和轨迹生成算法等资源。此外,还实现了一种基于滤波误差的单层神经自适应补偿控制,用于四轮全向移动机器人的车轮速度控制。采用单层神经网络对移动机器人的未知非线性函数进行补偿;利用合适的权值更新自适应律在线估计神经网络的权值。
{"title":"A neural network approach to path-detection and self-driving vehicles using You Only Look Once and one-layer neuro-adaptive control","authors":"Sergio López,&nbsp;Miguel Llama,&nbsp;Gibran López","doi":"10.1016/j.engappai.2025.113633","DOIUrl":"10.1016/j.engappai.2025.113633","url":null,"abstract":"<div><div>One of the main challenges of modern robotic systems is how to provide robots with the capacity for self-learning and decision-making on their own. Vision systems can lead to new robotic systems capable of completing complicated tasks in complex work environments by improving the precision and performance of robotic systems. In this work, a navigation and obstacle detection system is developed and implemented in simulation and experimentally with different types of sensors and processing units. This system is mainly based on a 4-wheeled omnidirectional mobile robot, a computer vision system, and deep and shallow artificial neural networks. The task of this system is to detect and follow a path, regardless of its shape, using an on-board camera and artificial neural networks; then resources are implemented to detect obstacles and take actions accordingly. To detect the path, two You Only Look Once models are trained, while for object detection the state-of-the-art Tiny You Only Look Once version 2 model is used. Resources such as transfer learning, coordinate projection, and trajectory generation algorithms are also used. In addition, a single-layer neuro-adaptive compensation control based on filtered error is implemented and dedicated for controlling the speed of the wheels of a 4-wheeled omnidirectional mobile robot. The single-layer neural network is used to compensate for the unknown non-linear functions of the mobile robot; the weights of the neural network is estimated online using suitable weight-updating adaptive laws.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"167 ","pages":"Article 113633"},"PeriodicalIF":8.0,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-scale fusion global perception network for gastrointestinal disease classification 胃肠疾病分类的多尺度融合全局感知网络
IF 8 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-01-12 DOI: 10.1016/j.engappai.2026.113831
Sheng Li, Yulin Yu, Xiongxiong He
With the rising incidence of gastrointestinal diseases, improving the accuracy of automated diagnosis has become a critical research focus in medical image analysis. Efficient and precise diagnostic techniques not only advance the interdisciplinary development of computer vision and medical imaging but also play a vital role in enabling early detection and personalized treatment in clinical practice. To enhance the model’s adaptability to various gastrointestinal lesions and image quality variations while improving generalization and recognition accuracy, we propose a multi-scale fusion-based global perception network. The multi-scale cross-fusion attention module strengthens the model’s ability to detect diverse lesions and lesion areas. Meanwhile, the global perception module facilitates interactions across different regions, effectively capturing long-range dependencies to mitigate the challenges posed by high inter-class similarity and large intra-class variance. Additionally, we introduce an advanced feature fusion framework that integrates both shallow and deep features, ensuring a comprehensive utilization of image details and global context. The adaptive feature selection mechanism further enables the network to flexibly adjust to different lesion types, capturing complex gastrointestinal features and overcoming the limitations of single-scale representations. We evaluated our method on a private five-class small intestine dataset, the public five-class Kvasir-Capsule dataset, the public three-class Kvasir dataset, the public three-class Hyper-Kvasir dataset, and the public four-class Piccolo dataset. Experimental results demonstrate that our proposed method outperforms the comparative methods. The overall classification accuracies achieved were 97.58%, 98.33%, 97.17%, 95.22%, and 94.44%, respectively. These results not only demonstrate the superiority of our method on specific datasets but also highlight its strong generalization ability across different datasets and clinical scenarios. This not only validates the practical effectiveness of the proposed model in complex clinical imaging scenarios, but also provides a solid theoretical foundation and technical support for the future design and deployment of intelligent gastrointestinal disease diagnosis systems.
随着胃肠道疾病发病率的不断上升,提高自动诊断的准确性已成为医学图像分析的重要研究热点。高效、精确的诊断技术不仅推动了计算机视觉和医学成像的跨学科发展,而且在临床实践中对早期发现和个性化治疗起着至关重要的作用。为了增强模型对各种胃肠道病变和图像质量变化的适应性,同时提高泛化和识别精度,我们提出了一种基于多尺度融合的全局感知网络。多尺度交叉融合注意模块增强了模型检测不同病变和病变区域的能力。同时,全局感知模块促进了不同区域之间的交互,有效地捕获了长期依赖关系,以减轻高类间相似性和大类内方差带来的挑战。此外,我们还引入了一种先进的特征融合框架,该框架集成了浅层和深层特征,确保了图像细节和全局背景的综合利用。自适应特征选择机制进一步使网络能够灵活地适应不同的病变类型,捕捉复杂的胃肠道特征,克服单尺度表征的局限性。我们在一个私人五类小肠数据集、公共五类Kvasir- capsule数据集、公共三类Kvasir数据集、公共三类Hyper-Kvasir数据集和公共四类Piccolo数据集上评估了我们的方法。实验结果表明,本文提出的方法优于比较方法。总体分类准确率分别为97.58%、98.33%、97.17%、95.22%和94.44%。这些结果不仅证明了我们的方法在特定数据集上的优势,而且突出了它在不同数据集和临床场景中的强大泛化能力。这不仅验证了所提出模型在复杂临床影像场景下的实际有效性,也为未来胃肠道疾病智能诊断系统的设计和部署提供了坚实的理论基础和技术支持。
{"title":"Multi-scale fusion global perception network for gastrointestinal disease classification","authors":"Sheng Li,&nbsp;Yulin Yu,&nbsp;Xiongxiong He","doi":"10.1016/j.engappai.2026.113831","DOIUrl":"10.1016/j.engappai.2026.113831","url":null,"abstract":"<div><div>With the rising incidence of gastrointestinal diseases, improving the accuracy of automated diagnosis has become a critical research focus in medical image analysis. Efficient and precise diagnostic techniques not only advance the interdisciplinary development of computer vision and medical imaging but also play a vital role in enabling early detection and personalized treatment in clinical practice. To enhance the model’s adaptability to various gastrointestinal lesions and image quality variations while improving generalization and recognition accuracy, we propose a multi-scale fusion-based global perception network. The multi-scale cross-fusion attention module strengthens the model’s ability to detect diverse lesions and lesion areas. Meanwhile, the global perception module facilitates interactions across different regions, effectively capturing long-range dependencies to mitigate the challenges posed by high inter-class similarity and large intra-class variance. Additionally, we introduce an advanced feature fusion framework that integrates both shallow and deep features, ensuring a comprehensive utilization of image details and global context. The adaptive feature selection mechanism further enables the network to flexibly adjust to different lesion types, capturing complex gastrointestinal features and overcoming the limitations of single-scale representations. We evaluated our method on a private five-class small intestine dataset, the public five-class Kvasir-Capsule dataset, the public three-class Kvasir dataset, the public three-class Hyper-Kvasir dataset, and the public four-class Piccolo dataset. Experimental results demonstrate that our proposed method outperforms the comparative methods. The overall classification accuracies achieved were 97.58%, 98.33%, 97.17%, 95.22%, and 94.44%, respectively. These results not only demonstrate the superiority of our method on specific datasets but also highlight its strong generalization ability across different datasets and clinical scenarios. This not only validates the practical effectiveness of the proposed model in complex clinical imaging scenarios, but also provides a solid theoretical foundation and technical support for the future design and deployment of intelligent gastrointestinal disease diagnosis systems.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"167 ","pages":"Article 113831"},"PeriodicalIF":8.0,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural style transfer architectures for improving generalization in low-resource spoken language identification 提高低资源口语识别泛化的神经风格迁移架构
IF 8 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-01-12 DOI: 10.1016/j.engappai.2026.113720
Spandan Dey, Goutam Saha
Navigating the challenges of limited multilingual labeled speech data in low-resource spoken language identification (LID), this work proposes utilizing neural style transfer (NST) architectures in LID. NST, predominantly used in computer vision, generates synthesized images by blending the texture of a style image into a content image. Although NST is gaining attention in generative audio applications, its potentials are rarely explored as an augmentation source in speech-based classification tasks. To our knowledge, this is one of the first works that propose applications of NST in LID training by generating synthetic augmented audios. For the baseline NST, we choose a shallow, wide, random convolution neural network (CNN) layer developed for music generation. We first optimize NST hyper-parameters, like CNN layer count and channel size, input spectrogram dimensions, and then propose three pivotal architectural enhancements in the NST framework: (i) Replacing the random CNN layer with a pre-trained LID encoder, (ii) proposing dual spectro-temporal attentive audio style extraction block for effective texture capturing, and (iii) introducing weighted residual connection for balancing the style and content information. The NST-generated audios are then utilized as novel audio augmentation technique and as pseudo-domain for adversarial domain generalization, enhancing same-corpora and cross-corpora LID performance across three prominent South Asian LID corpora and language recognition evaluation (LRE) 2022 challenge data. To further enhance the potential of NST in LID, we perform score fusion with LID models based on top-3 and top-5 best-performing NST hyper-parameter sets. The NST-based augmentation outperforms several other widely popular augmentation methods in LID.
针对低资源口语识别(LID)中有限的多语言标记语音数据的挑战,本工作提出在LID中使用神经风格迁移(NST)架构。NST主要用于计算机视觉,它通过将样式图像的纹理混合到内容图像中来生成合成图像。虽然NST在生成音频应用中越来越受到关注,但它在基于语音的分类任务中作为增强源的潜力却很少被探索。据我们所知,这是通过生成合成增强音频提出NST在LID训练中的应用的首批作品之一。对于基线NST,我们选择一个为音乐生成而开发的浅、宽、随机卷积神经网络(CNN)层。我们首先优化了NST超参数,如CNN层数和通道大小、输入频谱图维度,然后在NST框架中提出了三个关键的架构增强:(i)用预训练的LID编码器替换随机CNN层,(ii)提出双光谱-时间关注音频风格提取块以有效捕获纹理,(iii)引入加权残差连接以平衡风格和内容信息。然后将nst生成的音频用作新型音频增强技术和对抗域泛化的伪域,在三个突出的南亚LID语料库和语言识别评估(LRE) 2022挑战数据中增强同语料库和跨语料库的LID性能。为了进一步增强NST在LID中的潜力,我们基于前3名和前5名表现最好的NST超参数集与LID模型进行得分融合。基于nst的增强优于LID中其他几种广泛流行的增强方法。
{"title":"Neural style transfer architectures for improving generalization in low-resource spoken language identification","authors":"Spandan Dey,&nbsp;Goutam Saha","doi":"10.1016/j.engappai.2026.113720","DOIUrl":"10.1016/j.engappai.2026.113720","url":null,"abstract":"<div><div>Navigating the challenges of limited multilingual labeled speech data in low-resource spoken language identification (LID), this work proposes utilizing neural style transfer (NST) architectures in LID. NST, predominantly used in computer vision, generates synthesized images by blending the texture of a style image into a content image. Although NST is gaining attention in generative audio applications, its potentials are rarely explored as an augmentation source in speech-based classification tasks. To our knowledge, this is one of the first works that propose applications of NST in LID training by generating synthetic augmented audios. For the baseline NST, we choose a shallow, wide, random convolution neural network (CNN) layer developed for music generation. We first optimize NST hyper-parameters, like CNN layer count and channel size, input spectrogram dimensions, and then propose three pivotal architectural enhancements in the NST framework: (i) Replacing the random CNN layer with a pre-trained LID encoder, (ii) proposing dual spectro-temporal attentive audio style extraction block for effective texture capturing, and (iii) introducing weighted residual connection for balancing the style and content information. The NST-generated audios are then utilized as novel audio augmentation technique and as pseudo-domain for adversarial domain generalization, enhancing same-corpora and cross-corpora LID performance across three prominent South Asian LID corpora and language recognition evaluation (LRE) 2022 challenge data. To further enhance the potential of NST in LID, we perform score fusion with LID models based on top-3 and top-5 best-performing NST hyper-parameter sets. The NST-based augmentation outperforms several other widely popular augmentation methods in LID.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"167 ","pages":"Article 113720"},"PeriodicalIF":8.0,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A sustainable multi-period hub location problem with uncertain flows and capacity: A hybrid solution approach using interval type-II fuzzy approximation 具有不确定流量和容量的可持续多周期枢纽选址问题:区间ii型模糊逼近的混合求解方法
IF 8 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-01-12 DOI: 10.1016/j.engappai.2025.113590
Zahra Shakeri , Asef Nazari , Mohadese Ghasemi , Dhananjay Thiruvady , Reza Shahabi-Shahmiri , Mohammad Ghasemi
The hub location problem is a critical challenge in supply systems that rely heavily on efficiently transporting goods, information, and passengers across interconnected components. This work represents an integrated mixed-integer linear programming mathematical model with three objectives to design a multi-period hub system with sustainable dimensions. Given the planning horizon, the objectives are to minimize total costs, minimize energy consumption, and maximize job opportunities. Hub facilities can be either permanent or temporary, utilizing traditional or modern technology. Therefore, the proposed model applies to various hub network design problems, especially under disruption conditions. Additionally, both permanent and temporary statuses are available for hub edges. The capacities and flows of hubs in the model are considered interval type-2 fuzzy variables. To find solutions of the multi-objective optimization model under fuzzy uncertainty, a solution approach combining chance constraint programming and the augmented ε-constraint method along with VlseKriterijumska Optimizacija I Kompromisno Resenje (AUGMECON2VIKOR) algorithm is employed. The computational characteristic of the algorithm is compared to the augmented ε-constraint method (AUGMECON2) algorithm, and ε-constraint method using a well-known AP dataset ranging from 10 to 50 nodes under crisp conditions. Results for various uncertainty levels are also obtained across different problem instance sizes. Finally, comprehensive sensitivity analyses on different scenarios are conducted to analyze the impacts of sustainable dimensions. The results indicate that the AUGMECON2VIKOR algorithm performs better over the planning horizon compared to the AUGMECON2 method. Additionally, considering both sustainability and job opportunities simultaneously, AUGMECON2VIKOR provides better Pareto solutions compared to other scenarios.
在供应系统中,枢纽位置问题是一个关键的挑战,因为供应系统严重依赖于在相互连接的组件之间高效地运输货物、信息和乘客。本文提出了一个具有三个目标的综合混合整数线性规划数学模型来设计具有可持续维度的多周期轮毂系统。给定规划范围,目标是最小化总成本,最小化能源消耗,最大化就业机会。枢纽设施可以是永久性的,也可以是临时性的,利用传统或现代技术。因此,所提出的模型适用于各种集线器网络设计问题,特别是在中断条件下。此外,集线器边缘可以使用永久和临时状态。模型中集线器的容量和流量被认为是区间2型模糊变量。为了求解模糊不确定性下的多目标优化模型,采用机会约束规划与增广ε-约束方法相结合的求解方法,结合VlseKriterijumska Optimizacija I Kompromisno Resenje (AUGMECON2VIKOR)算法。将该算法的计算特性与增强型ε-约束方法(AUGMECON2)算法和基于10 ~ 50个节点AP数据集的ε-约束方法进行了比较。在不同的问题实例大小中,也得到了不同不确定性水平的结果。最后,通过不同情景的综合敏感性分析,分析可持续发展维度的影响。结果表明,与AUGMECON2方法相比,AUGMECON2VIKOR算法在规划水平上的性能更好。此外,同时考虑到可持续性和就业机会,AUGMECON2VIKOR提供了比其他方案更好的帕累托解决方案。
{"title":"A sustainable multi-period hub location problem with uncertain flows and capacity: A hybrid solution approach using interval type-II fuzzy approximation","authors":"Zahra Shakeri ,&nbsp;Asef Nazari ,&nbsp;Mohadese Ghasemi ,&nbsp;Dhananjay Thiruvady ,&nbsp;Reza Shahabi-Shahmiri ,&nbsp;Mohammad Ghasemi","doi":"10.1016/j.engappai.2025.113590","DOIUrl":"10.1016/j.engappai.2025.113590","url":null,"abstract":"<div><div>The hub location problem is a critical challenge in supply systems that rely heavily on efficiently transporting goods, information, and passengers across interconnected components. This work represents an integrated mixed-integer linear programming mathematical model with three objectives to design a multi-period hub system with sustainable dimensions. Given the planning horizon, the objectives are to minimize total costs, minimize energy consumption, and maximize job opportunities. Hub facilities can be either permanent or temporary, utilizing traditional or modern technology. Therefore, the proposed model applies to various hub network design problems, especially under disruption conditions. Additionally, both permanent and temporary statuses are available for hub edges. The capacities and flows of hubs in the model are considered interval type-2 fuzzy variables. To find solutions of the multi-objective optimization model under fuzzy uncertainty, a solution approach combining chance constraint programming and the augmented ε-constraint method along with VlseKriterijumska Optimizacija I Kompromisno Resenje (AUGMECON2VIKOR) algorithm is employed. The computational characteristic of the algorithm is compared to the augmented ε-constraint method (AUGMECON2) algorithm, and ε-constraint method using a well-known AP dataset ranging from 10 to 50 nodes under crisp conditions. Results for various uncertainty levels are also obtained across different problem instance sizes. Finally, comprehensive sensitivity analyses on different scenarios are conducted to analyze the impacts of sustainable dimensions. The results indicate that the AUGMECON2VIKOR algorithm performs better over the planning horizon compared to the AUGMECON2 method. Additionally, considering both sustainability and job opportunities simultaneously, AUGMECON2VIKOR provides better Pareto solutions compared to other scenarios.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"167 ","pages":"Article 113590"},"PeriodicalIF":8.0,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Symmetric Positive Definite manifold deep metric learning for bearing fault diagnosis 对称正定流形深度度量学习在轴承故障诊断中的应用
IF 8 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-01-12 DOI: 10.1016/j.engappai.2026.113821
Junshi Cheng , Ruisheng Ran , Bin Fang , Benchao Li
Rolling bearings are essential components in rotating machinery, and their failures can lead to unplanned downtime, substantial economic losses, and even severe safety risks. Consequently, bearing fault diagnosis has become a crucial task in modern industry, with machine learning methods playing a central role. However, many existing methods are designed in Euclidean space, limiting their ability to capture nonlinear features in bearing signals Additionally, they often have excessive parameters and irrelevant features, making it difficult to learn the correct data distribution. To address these challenges, this paper proposes a Symmetric Positive Definite (SPD) manifold deep metric learning method for bearing fault diagnosis, based on a supervised learning. This method transforms the original data into a SPD manifold, and constructs a SPD sparse denoising autoencoder for feature extraction. And then, a multi-class N-pair loss term on SPD manifold is used to improve classification ability. Comprehensive experiments have shown that this method has strong robustness to noise and low computational complexity. Due to the nonlinear expression ability of SPD manifolds, this method improves classification accuracy and is superior to existing methods in Euclidean space.
滚动轴承是旋转机械中的重要部件,它们的故障可能导致计划外停机,巨大的经济损失,甚至严重的安全风险。因此,轴承故障诊断已成为现代工业中的一项重要任务,机器学习方法在其中发挥着核心作用。然而,现有的许多方法都是在欧几里德空间中设计的,限制了它们捕捉轴承信号中非线性特征的能力,而且它们往往具有过多的参数和不相关的特征,难以学习正确的数据分布。为了解决这些问题,本文提出了一种基于监督学习的对称正定流形深度度量学习轴承故障诊断方法。该方法将原始数据转换成SPD流形,并构造SPD稀疏去噪自编码器进行特征提取。然后利用SPD流形上的多类n对损失项来提高分类能力。综合实验表明,该方法对噪声具有较强的鲁棒性和较低的计算复杂度。由于SPD流形的非线性表达能力,该方法提高了分类精度,在欧氏空间中优于现有方法。
{"title":"Symmetric Positive Definite manifold deep metric learning for bearing fault diagnosis","authors":"Junshi Cheng ,&nbsp;Ruisheng Ran ,&nbsp;Bin Fang ,&nbsp;Benchao Li","doi":"10.1016/j.engappai.2026.113821","DOIUrl":"10.1016/j.engappai.2026.113821","url":null,"abstract":"<div><div>Rolling bearings are essential components in rotating machinery, and their failures can lead to unplanned downtime, substantial economic losses, and even severe safety risks. Consequently, bearing fault diagnosis has become a crucial task in modern industry, with machine learning methods playing a central role. However, many existing methods are designed in Euclidean space, limiting their ability to capture nonlinear features in bearing signals Additionally, they often have excessive parameters and irrelevant features, making it difficult to learn the correct data distribution. To address these challenges, this paper proposes a Symmetric Positive Definite (SPD) manifold deep metric learning method for bearing fault diagnosis, based on a supervised learning. This method transforms the original data into a SPD manifold, and constructs a SPD sparse denoising autoencoder for feature extraction. And then, a multi-class <span><math><mi>N</mi></math></span>-pair loss term on SPD manifold is used to improve classification ability. Comprehensive experiments have shown that this method has strong robustness to noise and low computational complexity. Due to the nonlinear expression ability of SPD manifolds, this method improves classification accuracy and is superior to existing methods in Euclidean space.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"167 ","pages":"Article 113821"},"PeriodicalIF":8.0,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimised Canny edge detection algorithm for medical image feature mapping and extraction 优化Canny边缘检测算法用于医学图像特征映射与提取
IF 8 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-01-12 DOI: 10.1016/j.engappai.2026.113763
A.E. Emmanuel , K.A. Amusa , T.C. Erinosho , M.T. Raji
The conventional Canny Edge Detector (CED) often performs poorly when confronted with variability, noise and low contrast, which are prevalent challenges in medical imaging. To address these limitations, this study presents an Optimised Canny Edge Detection (OCED) algorithm for enhanced extraction of medical image features. The proposed method integrates Adaptive Histogram Equalisation (AHE) into the processing stage to improve local contrast and reveal subtle structural details before edge detection. In addition, a hybrid ant colony and bee colony optimisation strategy is embedded within the CED framework to dynamically tune key parameters, thereby improving threshold adaptability, edge sharpness and structural feature preservation across diverse imaging conditions. The OCED algorithm was evaluated using a dataset of 220 greyscale medical images comprising 55 images each from X-ray, computed tomography (CT), magnetic resonance imaging (MRI) and ultrasound modalities, sourced from Kaggle and Radiopaedia. All images underwent intensity normalisation, noise reduction, and resizing before being processed by the OCED pipeline. Performance was assessed using both qualitative inspection of pictorial outputs from the algorithmic pipeline and quantitative metrics, including accuracy, mean squared error (MSE) and execution time. The experimental results showed consistently high accuracy values (0.9937–0.9951), very low MSE (down to 0.0049) and execution times typically below 50 ms per image. Benchmarking against state-of-the-art methods further demonstrated that OCED provides superior edge quality and improved computational efficiency relative to existing models. These findings establish OCED as a robust, adaptive and computationally efficient framework for feature mapping and boundary extraction across multiple medical imaging modalities.
传统的Canny边缘检测器(CED)在面对变异性、噪声和低对比度时表现不佳,这是医学成像中普遍存在的挑战。为了解决这些限制,本研究提出了一种优化的Canny边缘检测(OCED)算法,用于增强医学图像特征的提取。该方法将自适应直方图均衡化(AHE)技术集成到图像处理阶段,在边缘检测前提高图像的局部对比度,揭示图像的细微结构细节。此外,在CED框架中嵌入了一个混合蚁群和蜂群优化策略,以动态调整关键参数,从而提高阈值适应性,边缘清晰度和不同成像条件下的结构特征保存。OCED算法使用220个灰度医学图像数据集进行评估,这些图像包括来自Kaggle和Radiopaedia的x射线、计算机断层扫描(CT)、磁共振成像(MRI)和超声模式的55个图像。所有图像经过强度归一化、降噪和调整大小,然后由OCED管道处理。通过对算法流水线的图形输出进行定性检查和定量指标(包括准确性、均方误差(MSE)和执行时间)来评估性能。实验结果显示,高准确度值(0.9937-0.9951),非常低的MSE(低至0.0049),每张图像的执行时间通常低于50 ms。针对最先进方法的基准测试进一步表明,与现有模型相比,OCED提供了卓越的边缘质量和更高的计算效率。这些发现使OCED成为一种鲁棒的、自适应的、计算效率高的框架,用于跨多种医学成像模式的特征映射和边界提取。
{"title":"Optimised Canny edge detection algorithm for medical image feature mapping and extraction","authors":"A.E. Emmanuel ,&nbsp;K.A. Amusa ,&nbsp;T.C. Erinosho ,&nbsp;M.T. Raji","doi":"10.1016/j.engappai.2026.113763","DOIUrl":"10.1016/j.engappai.2026.113763","url":null,"abstract":"<div><div>The conventional Canny Edge Detector (CED) often performs poorly when confronted with variability, noise and low contrast, which are prevalent challenges in medical imaging. To address these limitations, this study presents an Optimised Canny Edge Detection (OCED) algorithm for enhanced extraction of medical image features. The proposed method integrates Adaptive Histogram Equalisation (AHE) into the processing stage to improve local contrast and reveal subtle structural details before edge detection. In addition, a hybrid ant colony and bee colony optimisation strategy is embedded within the CED framework to dynamically tune key parameters, thereby improving threshold adaptability, edge sharpness and structural feature preservation across diverse imaging conditions. The OCED algorithm was evaluated using a dataset of 220 greyscale medical images comprising 55 images each from X-ray, computed tomography (CT), magnetic resonance imaging (MRI) and ultrasound modalities, sourced from Kaggle and Radiopaedia. All images underwent intensity normalisation, noise reduction, and resizing before being processed by the OCED pipeline. Performance was assessed using both qualitative inspection of pictorial outputs from the algorithmic pipeline and quantitative metrics, including accuracy, mean squared error (MSE) and execution time. The experimental results showed consistently high accuracy values (0.9937–0.9951), very low MSE (down to 0.0049) and execution times typically below 50 ms per image. Benchmarking against state-of-the-art methods further demonstrated that OCED provides superior edge quality and improved computational efficiency relative to existing models. These findings establish OCED as a robust, adaptive and computationally efficient framework for feature mapping and boundary extraction across multiple medical imaging modalities.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"167 ","pages":"Article 113763"},"PeriodicalIF":8.0,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced diagnosis of broken rotor bar faults using discrete wavelet transform and convolutional neural network 基于离散小波变换和卷积神经网络的转子断条故障强化诊断
IF 8 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2026-01-12 DOI: 10.1016/j.engappai.2026.113784
Pankaj Chauhan , Kunal Dewangan , Ramnivas Kumar , Sachin Kumar Singh
In this work, a novel hybrid approach is proposed that integrates the discrete wavelet transform (DWT) with convolutional neural networks (CNNs) for detecting broken rotor bar (BRB) faults in a squirrel-cage induction motor (SCIM). The DWT is used for signal decomposition and feature extraction, while the CNN performs accurate fault classification and diagnosis. Voltage signals under varying load conditions are used as input to the algorithm. Initially, the DWT is employed for extracting relevant fault features. Two input representations are considered for CNN-based classification. In the first case, time-domain images are created from the DWT detailed coefficients. In the second case, frequency-domain images are obtained by taking the spectrum of these DWT coefficients. The CNN model is trained using hyperparameter tuning and k-fold cross-validation to optimize classification performance. Experimental validation using an IEEE Dataport dataset encompassing five fault categories and eight load levels demonstrates that the proposed method achieves a classification accuracy of approximately 99 %, outperforming traditional raw-signal CNN models. The results further indicate that time-domain images from the fourth-level DWT coefficients and frequency-domain images from the fifth-level DWT coefficients provide the highest accuracy, reaching up to 99 %. A comparative study between the two input representations shows that the frequency-domain images offer slightly more robust performance. Performance indicators, including precision, F1-score, and recall, confirm the reliability of the proposed model. Overall, the findings establish the effectiveness of combining DWT-based feature extraction with deep learning for BRB fault diagnosis in SCIMs.
在这项工作中,提出了一种新的混合方法,将离散小波变换(DWT)与卷积神经网络(cnn)相结合,用于检测鼠笼式异步电动机(SCIM)转子断条(BRB)故障。小波变换用于信号分解和特征提取,CNN进行准确的故障分类和诊断。该算法采用不同负载条件下的电压信号作为输入。首先,采用小波变换提取相关故障特征。基于cnn的分类考虑了两种输入表示。在第一种情况下,从DWT详细系数创建时域图像。在第二种情况下,通过取这些DWT系数的频谱获得频域图像。CNN模型使用超参数调优和k-fold交叉验证进行训练,以优化分类性能。使用包含5个故障类别和8个负载级别的IEEE datapport数据集进行的实验验证表明,该方法的分类准确率约为99%,优于传统的原始信号CNN模型。结果进一步表明,来自第四级DWT系数的时域图像和来自第五级DWT系数的频域图像的精度最高,达到99%。两种输入表示之间的比较研究表明,频域图像提供稍强的鲁棒性。包括精度、f1分数和召回率在内的性能指标证实了所提出模型的可靠性。总体而言,研究结果证明了将基于dwt的特征提取与深度学习相结合用于SCIMs中BRB故障诊断的有效性。
{"title":"Enhanced diagnosis of broken rotor bar faults using discrete wavelet transform and convolutional neural network","authors":"Pankaj Chauhan ,&nbsp;Kunal Dewangan ,&nbsp;Ramnivas Kumar ,&nbsp;Sachin Kumar Singh","doi":"10.1016/j.engappai.2026.113784","DOIUrl":"10.1016/j.engappai.2026.113784","url":null,"abstract":"<div><div>In this work, a novel hybrid approach is proposed that integrates the discrete wavelet transform (DWT) with convolutional neural networks (CNNs) for detecting broken rotor bar (BRB) faults in a squirrel-cage induction motor (SCIM). The DWT is used for signal decomposition and feature extraction, while the CNN performs accurate fault classification and diagnosis. Voltage signals under varying load conditions are used as input to the algorithm. Initially, the DWT is employed for extracting relevant fault features. Two input representations are considered for CNN-based classification. In the first case, time-domain images are created from the DWT detailed coefficients. In the second case, frequency-domain images are obtained by taking the spectrum of these DWT coefficients. The CNN model is trained using hyperparameter tuning and k-fold cross-validation to optimize classification performance. Experimental validation using an IEEE Dataport dataset encompassing five fault categories and eight load levels demonstrates that the proposed method achieves a classification accuracy of approximately 99 %, outperforming traditional raw-signal CNN models. The results further indicate that time-domain images from the fourth-level DWT coefficients and frequency-domain images from the fifth-level DWT coefficients provide the highest accuracy, reaching up to 99 %. A comparative study between the two input representations shows that the frequency-domain images offer slightly more robust performance. Performance indicators, including precision, F1-score, and recall, confirm the reliability of the proposed model. Overall, the findings establish the effectiveness of combining DWT-based feature extraction with deep learning for BRB fault diagnosis in SCIMs.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"167 ","pages":"Article 113784"},"PeriodicalIF":8.0,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Engineering Applications of Artificial Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1