Pub Date : 2026-02-05DOI: 10.1016/j.engappai.2026.114093
Elías Herrero Jaraba , Eduardo Martínez Carrasco , Anibal Antonio Prada Hurtado , María Teresa Villen Martínez , Guillermo Rios Gómez , David Hernando Polo , Julio David Buldain Pérez
This paper presents a hybrid deep learning model for fault detection in power transformers, addressing the limitations of conventional protection schemes under transient operating conditions. The proposed model, TransInception, integrates InceptionTime for efficient feature extraction in multivariate time series and Gated Transformer for capturing dependencies between variables. The architecture is modified by replacing the original gating mechanism with a linear double-layer output and removing a bottleneck layer responsible for handling temporal dependencies. The dataset used for training and testing was generated in a real-time digital simulation (RTDS) environment, consisting of an external grid, a delta-wye transformer, and a dynamic load. After training, the hybrid deep learning model was validated in a test grid specifically designed for this stage, where a parallel transformer configuration was implemented. This validation allowed for the evaluation of its performance in classifying internal, external, and no-fault conditions, as well as assessing cases of current transformer saturation. Additionally, sympathetic inrush conditions were studied to analyse the model’s response to interactions between power transformers. As future work, efforts will focus on improving the model’s adaptability to transient conditions and optimising its computational efficiency for deployment in substation protection systems.
{"title":"Hybrid Inception-Transformer model for signals classification: The case of electrical faults in power transformers","authors":"Elías Herrero Jaraba , Eduardo Martínez Carrasco , Anibal Antonio Prada Hurtado , María Teresa Villen Martínez , Guillermo Rios Gómez , David Hernando Polo , Julio David Buldain Pérez","doi":"10.1016/j.engappai.2026.114093","DOIUrl":"10.1016/j.engappai.2026.114093","url":null,"abstract":"<div><div>This paper presents a hybrid deep learning model for fault detection in power transformers, addressing the limitations of conventional protection schemes under transient operating conditions. The proposed model, TransInception, integrates InceptionTime for efficient feature extraction in multivariate time series and Gated Transformer for capturing dependencies between variables. The architecture is modified by replacing the original gating mechanism with a linear double-layer output and removing a bottleneck layer responsible for handling temporal dependencies. The dataset used for training and testing was generated in a real-time digital simulation (RTDS) environment, consisting of an external grid, a delta-wye transformer, and a dynamic load. After training, the hybrid deep learning model was validated in a test grid specifically designed for this stage, where a parallel transformer configuration was implemented. This validation allowed for the evaluation of its performance in classifying internal, external, and no-fault conditions, as well as assessing cases of current transformer saturation. Additionally, sympathetic inrush conditions were studied to analyse the model’s response to interactions between power transformers. As future work, efforts will focus on improving the model’s adaptability to transient conditions and optimising its computational efficiency for deployment in substation protection systems.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"168 ","pages":"Article 114093"},"PeriodicalIF":8.0,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-05DOI: 10.1016/j.engappai.2026.114014
Mohammad Alamgir Hossain , ZhongFu Ye , Md. Bipul Hossen , Md. Atiqur Rahman , Md Shohidul Islam , Md. Ibrahim Abdullah
Image captioning is a challenging task that requires a deep understanding of both visual and linguistic modalities to generate accurate and meaningful descriptions. Traditional methods often struggle to effectively integrate object-level and global scene features, leading to limited contextual awareness in generated captions. To address this, we propose a novel Hierarchical Region-Context Attention for Image Captioning framework that combines a Region-Context Attention Network for multi-scale visual feature fusion with a Hierarchical Attention-Based context encoding mechanism for refined representation learning. The Region-Context and Hierarchical Attention module extracts object-level features using Faster Region-based Convolutional Neural Network and global context features from Residual Networks, integrating them through a multi-head attention mechanism. This fusion enables localized object representations to be enriched with scene-level semantics. The fused visual features are further refined using a hierarchical attention-based approach, which employs both spatial and channel-wise attention to emphasize semantically relevant information across regions and dimensions. The decoder is implemented using a hierarchical Long Short-Term Memory network that generates captions in an autoregressive manner, leveraging the hierarchical attention-based refined features to guide each word prediction. This structure enables the model to maintain temporal coherence while dynamically attending to informative visual content. We evaluate our model on the Microsoft Common Objects in Context 2014 dataset, achieving a Bilingual Evaluation Understudy score of 40.0 and a Consensus-based Image Description Evaluation score of 132.5, surpassing state-of-the-art models. Results indicate that the model effectively captures object details and context, producing more coherent and accurate captions. The code for this project is publicly available at https://github.com/alamgirustc/HRcAIC.
图像字幕是一项具有挑战性的任务,需要对视觉和语言模式有深刻的理解,才能生成准确而有意义的描述。传统方法往往难以有效地整合对象级和全局场景特征,导致生成的字幕上下文意识有限。为了解决这个问题,我们提出了一种新的分层区域-上下文注意图像字幕框架,该框架结合了用于多尺度视觉特征融合的区域-上下文注意网络和用于精细表征学习的基于分层注意的上下文编码机制。区域-上下文和分层注意模块使用Faster基于区域的卷积神经网络和残差网络的全局上下文特征提取对象级特征,并通过多头注意机制将它们整合起来。这种融合使本地化的对象表示能够丰富场景级语义。融合的视觉特征使用基于分层注意力的方法进一步细化,该方法采用空间和通道智能注意力来强调跨区域和维度的语义相关信息。解码器使用分层长短期记忆网络实现,该网络以自回归的方式生成字幕,利用分层的基于注意力的精细特征来指导每个单词的预测。这种结构使模型能够在动态关注信息视觉内容的同时保持时间一致性。我们在Microsoft Common Objects in Context 2014数据集上评估了我们的模型,获得了40.0的双语评估替补分数和132.5的基于共识的图像描述评估分数,超过了最先进的模型。结果表明,该模型有效地捕获了目标细节和上下文,生成了更连贯和准确的标题。这个项目的代码可以在https://github.com/alamgirustc/HRcAIC上公开获得。
{"title":"Hierarchical Region-Context Attention for image captioning","authors":"Mohammad Alamgir Hossain , ZhongFu Ye , Md. Bipul Hossen , Md. Atiqur Rahman , Md Shohidul Islam , Md. Ibrahim Abdullah","doi":"10.1016/j.engappai.2026.114014","DOIUrl":"10.1016/j.engappai.2026.114014","url":null,"abstract":"<div><div>Image captioning is a challenging task that requires a deep understanding of both visual and linguistic modalities to generate accurate and meaningful descriptions. Traditional methods often struggle to effectively integrate object-level and global scene features, leading to limited contextual awareness in generated captions. To address this, we propose a novel Hierarchical Region-Context Attention for Image Captioning framework that combines a Region-Context Attention Network for multi-scale visual feature fusion with a Hierarchical Attention-Based context encoding mechanism for refined representation learning. The Region-Context and Hierarchical Attention module extracts object-level features using Faster Region-based Convolutional Neural Network and global context features from Residual Networks, integrating them through a multi-head attention mechanism. This fusion enables localized object representations to be enriched with scene-level semantics. The fused visual features are further refined using a hierarchical attention-based approach, which employs both spatial and channel-wise attention to emphasize semantically relevant information across regions and dimensions. The decoder is implemented using a hierarchical Long Short-Term Memory network that generates captions in an autoregressive manner, leveraging the hierarchical attention-based refined features to guide each word prediction. This structure enables the model to maintain temporal coherence while dynamically attending to informative visual content. We evaluate our model on the Microsoft Common Objects in Context 2014 dataset, achieving a Bilingual Evaluation Understudy score of 40.0 and a Consensus-based Image Description Evaluation score of 132.5, surpassing state-of-the-art models. Results indicate that the model effectively captures object details and context, producing more coherent and accurate captions. The code for this project is publicly available at <span><span>https://github.com/alamgirustc/HRcAIC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"168 ","pages":"Article 114014"},"PeriodicalIF":8.0,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ensuring the safety and efficiency of Autonomous Vehicles (AVs) necessitates highly accurate perception, especially for lane detection and lane-change manoeuvres. Among object detection frameworks, “You Only Look Once” (YOLO) algorithms have emerged as prominent contenders due to their rapid inference and commendable accuracy. However, the broad spectrum of YOLO variants and their applications in complex, real-world environments remain insufficiently mapped, necessitating a more integrative and critical perspective than what is typically offered by surveys. This comprehensive review synthesizes theoretical foundations, architectural innovations, and empirical evaluations of YOLO-based algorithms in AV-related tasks. It not only highlights key findings—such as the notable gains in real-time detection and adaptability to a range of driving conditions—but also explicitly identifies persistent gaps and limitations. These include difficulties in detecting subtle or degraded lane markings, handling unpredictable environmental factors like adverse weather and varied lighting, mitigating adversarial perturbations, and scaling effectively across diverse datasets and geographic regions. By critically examining these vulnerabilities, we illuminate the opportunities for refining YOLO's training paradigms, optimizing model architectures, incorporating sensor fusion, and fostering universally applicable datasets. The implications of addressing these gaps extend beyond mere technical refinements. Proactively tackling YOLO's current challenges can expedite the realization of safer, more robust, and globally adaptable AV navigation systems. In doing so, this review provides clear, actionable insights for researchers, engineers, and policymakers, guiding them toward strategic innovations that will strengthen AV perception and contribute to more reliable, future-ready transportation solutions.
{"title":"Advances in You Only Look Once (YOLO) algorithms for lane and object detection in autonomous vehicles","authors":"Busuyi Omodaratan , Ali Jamali , Timothy Wiley , Ziad Al-Saadi , Rammohan Mallipeddi , Ehsan Asadi , Hoshyar Asadi , Rasoul Sadeghian , Sina Sareh , Hamid Khayyam","doi":"10.1016/j.engappai.2026.113893","DOIUrl":"10.1016/j.engappai.2026.113893","url":null,"abstract":"<div><div>Ensuring the safety and efficiency of Autonomous Vehicles (AVs) necessitates highly accurate perception, especially for lane detection and lane-change manoeuvres. Among object detection frameworks, “You Only Look Once” (YOLO) algorithms have emerged as prominent contenders due to their rapid inference and commendable accuracy. However, the broad spectrum of YOLO variants and their applications in complex, real-world environments remain insufficiently mapped, necessitating a more integrative and critical perspective than what is typically offered by surveys. This comprehensive review synthesizes theoretical foundations, architectural innovations, and empirical evaluations of YOLO-based algorithms in AV-related tasks. It not only highlights key findings—such as the notable gains in real-time detection and adaptability to a range of driving conditions—but also explicitly identifies persistent gaps and limitations. These include difficulties in detecting subtle or degraded lane markings, handling unpredictable environmental factors like adverse weather and varied lighting, mitigating adversarial perturbations, and scaling effectively across diverse datasets and geographic regions. By critically examining these vulnerabilities, we illuminate the opportunities for refining YOLO's training paradigms, optimizing model architectures, incorporating sensor fusion, and fostering universally applicable datasets. The implications of addressing these gaps extend beyond mere technical refinements. Proactively tackling YOLO's current challenges can expedite the realization of safer, more robust, and globally adaptable AV navigation systems. In doing so, this review provides clear, actionable insights for researchers, engineers, and policymakers, guiding them toward strategic innovations that will strengthen AV perception and contribute to more reliable, future-ready transportation solutions.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"168 ","pages":"Article 113893"},"PeriodicalIF":8.0,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-05DOI: 10.1016/j.engappai.2026.114059
Zhiyong Chen , Zhao Yang , Peng Liu , Feng Wang , Yamei Dou
Based on powerful convolutional neural networks (CNNs) and complex model structures, semantic segmentation achieves good segmentation accuracy, but its slow inference speed limits its use in practical applications, such as autonomous driving and medical diagnosis. Thus, real-time semantic segmentation receives increasing attention. However, most existing real-time semantic segmentation methods improve inference speed while significantly sacrificing segmentation precision. Striking a well balance between inference speed and precision remains a major issue in real-time semantic segmentation. To address this issue, we propose a real-time semantic segmentation network, the Multi-Shape Enhancement Pyramid Network (MSEPNet). First, we propose an efficient spatial inverted residual (ESIR) module to effectively extract multi-scale spatial information. Next, to capture multi-scale semantic information while maintaining efficient inference speed, we introduce an efficient contextual residual (ECR) module. Finally, we present the multi-shape enhancement pyramid (MSEP) module to capture multi-scale and multi-shape contextual information. The proposed MSEPNet achieves competitive results on street scene datasets. Specifically, with only 1.04 million (1.04M) parameters, it achieves the accuracy of 76.7% and 72.5% mean Intersection over Union (mIoU) with the speed of 144.4 and 108.9 Frames Per Second (FPS) on Cityscapes and Cambridge-driving Labeled Video Database (CamVid) test sets, respectively. Furthermore, we conduct additional experiments on the Stanford Background dataset to verify the robustness of MSEPNet in diverse real-world environments, demonstrating its generalization ability beyond standard benchmarks.
{"title":"Multi-shape enhancement pyramid network for real-time semantic segmentation","authors":"Zhiyong Chen , Zhao Yang , Peng Liu , Feng Wang , Yamei Dou","doi":"10.1016/j.engappai.2026.114059","DOIUrl":"10.1016/j.engappai.2026.114059","url":null,"abstract":"<div><div>Based on powerful convolutional neural networks (CNNs) and complex model structures, semantic segmentation achieves good segmentation accuracy, but its slow inference speed limits its use in practical applications, such as autonomous driving and medical diagnosis. Thus, real-time semantic segmentation receives increasing attention. However, most existing real-time semantic segmentation methods improve inference speed while significantly sacrificing segmentation precision. Striking a well balance between inference speed and precision remains a major issue in real-time semantic segmentation. To address this issue, we propose a real-time semantic segmentation network, the Multi-Shape Enhancement Pyramid Network (MSEPNet). First, we propose an efficient spatial inverted residual (ESIR) module to effectively extract multi-scale spatial information. Next, to capture multi-scale semantic information while maintaining efficient inference speed, we introduce an efficient contextual residual (ECR) module. Finally, we present the multi-shape enhancement pyramid (MSEP) module to capture multi-scale and multi-shape contextual information. The proposed MSEPNet achieves competitive results on street scene datasets. Specifically, with only 1.04 million (1.04M) parameters, it achieves the accuracy of 76.7% and 72.5% mean Intersection over Union (mIoU) with the speed of 144.4 and 108.9 Frames Per Second (FPS) on Cityscapes and Cambridge-driving Labeled Video Database (CamVid) test sets, respectively. Furthermore, we conduct additional experiments on the Stanford Background dataset to verify the robustness of MSEPNet in diverse real-world environments, demonstrating its generalization ability beyond standard benchmarks.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"168 ","pages":"Article 114059"},"PeriodicalIF":8.0,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-05DOI: 10.1016/j.engappai.2026.114067
Fan Yang, Zhi Zheng, Xiaolan Pan, Zhongyao Lin, Pengkun Zhang
The seismic safety assessment of nuclear power plants (NPPs) fundamentally relies on accurate floor response spectra (FRS). Conventional generation methods, such as nonlinear time history analysis (NLTHA), are computationally prohibitive, while direct spectra-to-spectra methods struggle with structural nonlinearity and single ground-motion input. To overcome these limitations, this study proposes a novel deep learning framework, the multi-head attention-based convolutional bidirectional long short-term memory network (MAC-BiLSTM), for efficient and accurate FRS prediction in NPPs. This architecture strategically integrates convolutional neural networks (CNNs) for local feature extraction, bidirectional long short-term memory (BiLSTM) for modeling long-term temporal dependencies, and a multi-head attention (MHA) mechanism for dynamically weighting critical spectral features. A comprehensive FRS dataset is generated via NLTHA using a lumped-mass stick model of an NPP, explicitly incorporating structural uncertainties, and subjected to a suite of near-fault ground motions. Results demonstrate that the MAC-BiLSTM model achieves competitive accuracy across critical NPPs nodes while offering a computational speedup of approximately 120 times compared to conventional NLTHA. Sensitivity analyses further confirm the model's robustness across varying seismic intensities, frequency contents, and ground motion durations. Comparative studies against a suite of baseline machine learning models highlight the superiority of MAC-BiLSTM and underscore the critical role of the MHA mechanism in capturing short-period-dominated FRS characteristics unique to NPP structures. This work provides a powerful data-driven tool for the rapid seismic performance assessment and safety design of acceleration-sensitive nonstructural components in NPPs.
{"title":"Efficient deep learning-based prediction of floor response spectra for nuclear power plants using a multi-head attention-based convolutional bidirectional long short-term memory network","authors":"Fan Yang, Zhi Zheng, Xiaolan Pan, Zhongyao Lin, Pengkun Zhang","doi":"10.1016/j.engappai.2026.114067","DOIUrl":"10.1016/j.engappai.2026.114067","url":null,"abstract":"<div><div>The seismic safety assessment of nuclear power plants (NPPs) fundamentally relies on accurate floor response spectra (FRS). Conventional generation methods, such as nonlinear time history analysis (NLTHA), are computationally prohibitive, while direct spectra-to-spectra methods struggle with structural nonlinearity and single ground-motion input. To overcome these limitations, this study proposes a novel deep learning framework, the multi-head attention-based convolutional bidirectional long short-term memory network (MAC-BiLSTM), for efficient and accurate FRS prediction in NPPs. This architecture strategically integrates convolutional neural networks (CNNs) for local feature extraction, bidirectional long short-term memory (BiLSTM) for modeling long-term temporal dependencies, and a multi-head attention (MHA) mechanism for dynamically weighting critical spectral features. A comprehensive FRS dataset is generated via NLTHA using a lumped-mass stick model of an NPP, explicitly incorporating structural uncertainties, and subjected to a suite of near-fault ground motions. Results demonstrate that the MAC-BiLSTM model achieves competitive accuracy across critical NPPs nodes while offering a computational speedup of approximately 120 times compared to conventional NLTHA. Sensitivity analyses further confirm the model's robustness across varying seismic intensities, frequency contents, and ground motion durations. Comparative studies against a suite of baseline machine learning models highlight the superiority of MAC-BiLSTM and underscore the critical role of the MHA mechanism in capturing short-period-dominated FRS characteristics unique to NPP structures. This work provides a powerful data-driven tool for the rapid seismic performance assessment and safety design of acceleration-sensitive nonstructural components in NPPs.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"168 ","pages":"Article 114067"},"PeriodicalIF":8.0,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-05DOI: 10.1016/j.engappai.2026.114050
Yuchen He, Xi Li, Lijuan Qian, JiaWei Lu
Wind power, as a crucial form of clean energy, has its reliability seriously affected by faults in the gearbox of wind power generation system. To address this challenge, a novel spatiotemporal incremental broad learning network and pseudo-label driven adaptation (SIBN-PLDA) is proposed in this paper for the fault diagnosis of wind turbine gearboxes during multi-condition non-stationary process. Firstly, a temporal convolutional generalized mapping structure integrating spatiotemporal features is constructed within SIBN-PLDA to enhance the representation capability for non-stationary signals. In addition, the SIBN-PLDA model exhibits extendibility in order to adapt to newly introduced fault samples and fault classes. Then, under new working conditions, a pseudo label quality constraint mechanism cooperated by global and local classifiers is used to dynamically evaluate the pseudo label and adjust the distribution alignment strategy, so as to effectively improve the cross-working condition migration performance. Finally, a sample confidence ranking approach with a center loss function is designed to enable accurate identification of unknown fault classes. The performance of the proposed method is further validated on two industrial wind power gearbox datasets where the experimental results show that the proposed method is superior to the existing mainstream methods in terms of diagnosis accuracy, cross-condition robustness, unknown fault adaptability and online update capacity, revealing better practical application potential. Specifically, SIBN-PLDA achieves an average overall accuracy of 94.41% and 93.56% on the SEU and WT gearbox datasets respectively, with the average accuracy of unknown fault class identification reaching 82.25% and 80.97% on the two datasets, which is 7.22% and 5.92% higher than the second-best method respectively.
{"title":"A novel fault diagnosis method for wind turbine gearbox based on data-driven incremental broad network","authors":"Yuchen He, Xi Li, Lijuan Qian, JiaWei Lu","doi":"10.1016/j.engappai.2026.114050","DOIUrl":"10.1016/j.engappai.2026.114050","url":null,"abstract":"<div><div>Wind power, as a crucial form of clean energy, has its reliability seriously affected by faults in the gearbox of wind power generation system. To address this challenge, a novel spatiotemporal incremental broad learning network and pseudo-label driven adaptation (SIBN-PLDA) is proposed in this paper for the fault diagnosis of wind turbine gearboxes during multi-condition non-stationary process. Firstly, a temporal convolutional generalized mapping structure integrating spatiotemporal features is constructed within SIBN-PLDA to enhance the representation capability for non-stationary signals. In addition, the SIBN-PLDA model exhibits extendibility in order to adapt to newly introduced fault samples and fault classes. Then, under new working conditions, a pseudo label quality constraint mechanism cooperated by global and local classifiers is used to dynamically evaluate the pseudo label and adjust the distribution alignment strategy, so as to effectively improve the cross-working condition migration performance. Finally, a sample confidence ranking approach with a center loss function is designed to enable accurate identification of unknown fault classes. The performance of the proposed method is further validated on two industrial wind power gearbox datasets where the experimental results show that the proposed method is superior to the existing mainstream methods in terms of diagnosis accuracy, cross-condition robustness, unknown fault adaptability and online update capacity, revealing better practical application potential. Specifically, SIBN-PLDA achieves an average overall accuracy of 94.41% and 93.56% on the SEU and WT gearbox datasets respectively, with the average accuracy of unknown fault class identification reaching 82.25% and 80.97% on the two datasets, which is 7.22% and 5.92% higher than the second-best method respectively.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"168 ","pages":"Article 114050"},"PeriodicalIF":8.0,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-05DOI: 10.1016/j.engappai.2026.114101
Ronggang Ge , Yue Wang , Yonggeng Wei
Accurately measuring the size and spatial distribution of surface defects on steel products is essential for ensuring product quality. However, existing detection methods exhibit notable limitations in achieving high-precision measurements of defect dimensions and spatial localization. To address this issue, this study proposes two new modules: the reparameterized multi-scale receptive field module (RMRFM) and the hierarchical enhanced feature propagation network (HEFPN). By integrating the strengths of both modules, we further develop a unified detection architecture, termed You Only Look Once with Reparameterized Hierarchical Feature Learning (YOLO-RH). RMRFM significantly improves the model's measurement accuracy of defect size and spatial distribution without sacrificing detection speed through a reparameterized multi branch feature extraction strategy Meanwhile, HEFPN introduces a cross-layer feature interaction mechanism that effectively preserves shallow-layer texture information during feature extraction, providing essential support for the accurate measurement of defect attributes. Extensive experiments conducted on the NEU-DET dataset demonstrate that both RMRFM and HEFPN yield strong individual performance, while their combination in YOLO-RH achieves AP50, AP50:95, AR, APS, and ARS scores of 71.9%, 38.4%, 55.1%, 49.1%, and 61.1%, respectively, which were 3.7%, 1.9%, 5.2%, 11.8%, and 12.6% higher than baseline and consistently outperformed other state-of-the-art methods. Furthermore, generalization experiments on GC10-DET and PV-Multi datasets confirm the robustness of YOLO-RH across different materials and defect types. Finally, a steel defect measurement platform based on YOLO-RH is developed to validate its practical feasibility, offering a viable solution for intelligent defect measurement and automated sorting in real-world industrial environments.
准确测量钢制品表面缺陷的大小和空间分布,是保证产品质量的关键。然而,现有的检测方法在实现缺陷尺寸的高精度测量和空间定位方面存在明显的局限性。为了解决这一问题,本研究提出了两个新的模块:重参数化多尺度感受野模块(RMRFM)和分层增强特征传播网络(HEFPN)。通过整合这两个模块的优势,我们进一步开发了一个统一的检测架构,称为You Only Look Once with Reparameterized Hierarchical Feature Learning (YOLO-RH)。RMRFM通过重新参数化的多分支特征提取策略,在不牺牲检测速度的前提下,显著提高了模型对缺陷尺寸和空间分布的测量精度。同时,HEFPN引入了一种跨层特征交互机制,在特征提取过程中有效保留了浅层纹理信息,为缺陷属性的准确测量提供了必要的支持。在nue - det数据集上进行的大量实验表明,RMRFM和HEFPN都产生了很强的个体性能,而它们在ylo - rh中的组合分别达到了71.9%,38.4%,55.1%,49.1%和61.1%的AP50, AP50:95, AR, APS和ARS得分,分别比基线高3.7%,1.9%,5.2%,11.8%和12.6%,并且始终优于其他最先进的方法。此外,在GC10-DET和PV-Multi数据集上的推广实验证实了YOLO-RH对不同材料和缺陷类型的鲁棒性。最后,开发了基于YOLO-RH的钢缺陷测量平台,验证了其实际可行性,为现实工业环境下的智能缺陷测量和自动化分选提供了可行的解决方案。
{"title":"A reparameterized hierarchical feature learning network designed for accurate sizing and localization of steel surface defects","authors":"Ronggang Ge , Yue Wang , Yonggeng Wei","doi":"10.1016/j.engappai.2026.114101","DOIUrl":"10.1016/j.engappai.2026.114101","url":null,"abstract":"<div><div>Accurately measuring the size and spatial distribution of surface defects on steel products is essential for ensuring product quality. However, existing detection methods exhibit notable limitations in achieving high-precision measurements of defect dimensions and spatial localization. To address this issue, this study proposes two new modules: the reparameterized multi-scale receptive field module (RMRFM) and the hierarchical enhanced feature propagation network (HEFPN). By integrating the strengths of both modules, we further develop a unified detection architecture, termed You Only Look Once with Reparameterized Hierarchical Feature Learning (YOLO-RH). RMRFM significantly improves the model's measurement accuracy of defect size and spatial distribution without sacrificing detection speed through a reparameterized multi branch feature extraction strategy Meanwhile, HEFPN introduces a cross-layer feature interaction mechanism that effectively preserves shallow-layer texture information during feature extraction, providing essential support for the accurate measurement of defect attributes. Extensive experiments conducted on the NEU-DET dataset demonstrate that both RMRFM and HEFPN yield strong individual performance, while their combination in YOLO-RH achieves AP<sub>50</sub>, AP<sub>50:95</sub>, AR, AP<sub>S</sub>, and AR<sub>S</sub> scores of 71.9%, 38.4%, 55.1%, 49.1%, and 61.1%, respectively, which were 3.7%, 1.9%, 5.2%, 11.8%, and 12.6% higher than baseline and consistently outperformed other state-of-the-art methods. Furthermore, generalization experiments on GC10-DET and PV-Multi datasets confirm the robustness of YOLO-RH across different materials and defect types. Finally, a steel defect measurement platform based on YOLO-RH is developed to validate its practical feasibility, offering a viable solution for intelligent defect measurement and automated sorting in real-world industrial environments.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"168 ","pages":"Article 114101"},"PeriodicalIF":8.0,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-05DOI: 10.1016/j.engappai.2026.113949
Zhi Fang, Li Huang
Ensuring effective multi-agent decision-making (MAD) is crucial for tackling complex, high-stakes challenges in fields such as finance, healthcare, and defense, where collaborative, adaptive solutions are essential for navigating dynamic and interconnected environments. However, current methods still struggle with effective role distribution and system coordination, leading to issues like premature opinion convergence and suboptimal decision efficiency. These challenges limit the practical adoption of MAD systems in real-world applications. This paper introduces a novel multi-agent decision-making system based on symmetric authority distribution (SDMADS). By incorporating a dynamic delegation mechanism and a dynamic consensus strategy, our system enhances fairness and flexibility in decision-making. Additionally, we propose the Hierarchical Direct Preference Optimization (HDPO) algorithm to optimize agent behaviors across multiple levels. Experimental results demonstrate that our system significantly improves decision quality, increases the adoption of diverse opinions, reduces bias, and outperforms existing approaches in terms of decision efficiency and adaptability.
{"title":"Symmetry-based authority allocation for enhanced multi-agent decision-making","authors":"Zhi Fang, Li Huang","doi":"10.1016/j.engappai.2026.113949","DOIUrl":"10.1016/j.engappai.2026.113949","url":null,"abstract":"<div><div>Ensuring effective multi-agent decision-making (MAD) is crucial for tackling complex, high-stakes challenges in fields such as finance, healthcare, and defense, where collaborative, adaptive solutions are essential for navigating dynamic and interconnected environments. However, current methods still struggle with effective role distribution and system coordination, leading to issues like premature opinion convergence and suboptimal decision efficiency. These challenges limit the practical adoption of MAD systems in real-world applications. This paper introduces a novel multi-agent decision-making system based on symmetric authority distribution (SDMADS). By incorporating a dynamic delegation mechanism and a dynamic consensus strategy, our system enhances fairness and flexibility in decision-making. Additionally, we propose the Hierarchical Direct Preference Optimization (HDPO) algorithm to optimize agent behaviors across multiple levels. Experimental results demonstrate that our system significantly improves decision quality, increases the adoption of diverse opinions, reduces bias, and outperforms existing approaches in terms of decision efficiency and adaptability.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"168 ","pages":"Article 113949"},"PeriodicalIF":8.0,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-05DOI: 10.1016/j.engappai.2026.113970
Salvatore Sessa, Luciano Rosati
We explore the application of quantum-enhanced machine learning techniques for the structural analysis of reinforced concrete sections subjected to combined axial force and biaxial bending. A quantum support vector machine framework is developed and employed to approximate the limit state surface that defines the ultimate capacity of Reinforced Concrete cross-sections. Several quantum kernel architectures, namely Fidelity, Mercer-inspired, ZZ-Feature Map, and HE2, are implemented and tested on two representative geometries: a symmetric rectangular section and an asymmetric L-shaped one. Kernel performance is evaluated in terms of classification accuracy, generalization behavior, and decision boundary coherence, showing strengths and limitations of each quantum feature map. Analyses have shown that the Fidelity, Mercer Static, and single-layer HE2 kernels provide the most robust and geometrically sound decision surfaces, thus achieving high classification accuracy and stability. On the contrary, deeper or highly expressive circuits exhibit overfitting and reduced generalization. Despite the limited accuracy observed for some kernel configurations, the results prove that quantum kernels can effectively approximate capacity domains, particularly in symmetric cases. Hence, quantum-enhanced models represent a promising direction for efficient structural verification, even if the technology is still in its early stages.
{"title":"Quantum computing for capacity checks of reinforced concrete sections","authors":"Salvatore Sessa, Luciano Rosati","doi":"10.1016/j.engappai.2026.113970","DOIUrl":"10.1016/j.engappai.2026.113970","url":null,"abstract":"<div><div>We explore the application of quantum-enhanced machine learning techniques for the structural analysis of reinforced concrete sections subjected to combined axial force and biaxial bending. A quantum support vector machine framework is developed and employed to approximate the limit state surface that defines the ultimate capacity of Reinforced Concrete cross-sections. Several quantum kernel architectures, namely Fidelity, Mercer-inspired, ZZ-Feature Map, and HE2, are implemented and tested on two representative geometries: a symmetric rectangular section and an asymmetric L-shaped one. Kernel performance is evaluated in terms of classification accuracy, generalization behavior, and decision boundary coherence, showing strengths and limitations of each quantum feature map. Analyses have shown that the Fidelity, Mercer Static, and single-layer HE2 kernels provide the most robust and geometrically sound decision surfaces, thus achieving high classification accuracy and stability. On the contrary, deeper or highly expressive circuits exhibit overfitting and reduced generalization. Despite the limited accuracy observed for some kernel configurations, the results prove that quantum kernels can effectively approximate capacity domains, particularly in symmetric cases. Hence, quantum-enhanced models represent a promising direction for efficient structural verification, even if the technology is still in its early stages.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"168 ","pages":"Article 113970"},"PeriodicalIF":8.0,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The rapid disappearance of endangered languages presents a critical challenge to cultural preservation and linguistic diversity. Traditional grammatical induction techniques face the challenge of having limited annotated data to work with on such language. To address these issues, this study proposes a Neural-Symbolic Artificial Intelligence (AI) based Grammatical Inference framework. It integrates Graph Neural Networks (GNNs) with instruction-tuned language models for AI-driven grammar induction in low-resource languages. This framework uses Few-shot Relational Graph Convolutional Networks (FS-R-GCN) to convert grammatically erroneous texts into relation graphs. Then, the Instruction-Tuned Bidirectional Encoder Representations from Transformers (BERT) AI model processes these graph representations alongside the original erroneous sentences, generating grammatically accurate alternatives. This Instruction-Tuned Language model assists by identifying grammatical inconsistencies and providing structural guidance to the BERT model. Experimental results on the AlexEBall/Endangered_Languages_Capstone_Proj_1 dataset show that the proposed method improves performance significantly with 99.01 % accuracy, 98.70 % precision, and 98.00 % F1-score compared to the existing methods.
{"title":"Towards neural-symbolic grammatical inference for endangered languages using integrating graph neural networks and instruction-tuned language models","authors":"Manu Singh , Neha Gupta , Shiva Tyagi , Ashima Rani , Vinod Kumar , Surbhi Sharma","doi":"10.1016/j.engappai.2026.114011","DOIUrl":"10.1016/j.engappai.2026.114011","url":null,"abstract":"<div><div>The rapid disappearance of endangered languages presents a critical challenge to cultural preservation and linguistic diversity. Traditional grammatical induction techniques face the challenge of having limited annotated data to work with on such language. To address these issues, this study proposes a Neural-Symbolic Artificial Intelligence (AI) based Grammatical Inference framework. It integrates Graph Neural Networks (GNNs) with instruction-tuned language models for AI-driven grammar induction in low-resource languages. This framework uses Few-shot Relational Graph Convolutional Networks (FS-R-GCN) to convert grammatically erroneous texts into relation graphs. Then, the Instruction-Tuned Bidirectional Encoder Representations from Transformers (BERT) AI model processes these graph representations alongside the original erroneous sentences, generating grammatically accurate alternatives. This Instruction-Tuned Language model assists by identifying grammatical inconsistencies and providing structural guidance to the BERT model. Experimental results on the AlexEBall/Endangered_Languages_Capstone_Proj_1 dataset show that the proposed method improves performance significantly with 99.01 % accuracy, 98.70 % precision, and 98.00 % F1-score compared to the existing methods.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"168 ","pages":"Article 114011"},"PeriodicalIF":8.0,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}