Pub Date : 2026-02-12DOI: 10.1016/j.engappai.2026.114177
Hu Wang , Mao Ye , Dengyan Luo , Yan Gan
The existing photographic equipment is not able to capture scenes of the natural world very well. Thus, the problem of reconstructing high dynamic range (HDR) images from multi-exposure low dynamic range (LDR) images arises because these images have different details. The existing methods do not fully leverage imaging knowledge in the LDR image generation pipeline, resulting in design redundancy and inefficient resource utilization. We propose a new Multi-Exposure HDR reconstruction by incorporating Imaging Knowledge (MEIK) for efficient HDR image reconstruction. Our method consists of two parts: fusion of LDR features and reconstruction of HDR feature. Due to object motion and exposure time effects, LDR features with different exposures need to be fused. A Multi-Exposure Information Aggregation (MEIA) module is proposed to fuse LDR features based on Mamba. After that, an Inverse imaging Knowledge-Driven (IKD) cluster is employed to reconstruct the HDR feature, which is a cascade of IKD blocks at different scales. The IKD block consists of three parts: HDR information recovery, imaging parameter adjustment, and noise suppression, used to simulate the mathematical formula for multi-exposure HDR imaging. Experimental results demonstrate that the proposed MEIK model outperforms existing state-of-the-art models and exhibits strong scalability.
{"title":"Multi-exposure high dynamic range reconstruction by incorporating imaging knowledge","authors":"Hu Wang , Mao Ye , Dengyan Luo , Yan Gan","doi":"10.1016/j.engappai.2026.114177","DOIUrl":"10.1016/j.engappai.2026.114177","url":null,"abstract":"<div><div>The existing photographic equipment is not able to capture scenes of the natural world very well. Thus, the problem of reconstructing high dynamic range (HDR) images from multi-exposure low dynamic range (LDR) images arises because these images have different details. The existing methods do not fully leverage imaging knowledge in the LDR image generation pipeline, resulting in design redundancy and inefficient resource utilization. We propose a new Multi-Exposure HDR reconstruction by incorporating Imaging Knowledge (MEIK) for efficient HDR image reconstruction. Our method consists of two parts: fusion of LDR features and reconstruction of HDR feature. Due to object motion and exposure time effects, LDR features with different exposures need to be fused. A Multi-Exposure Information Aggregation (MEIA) module is proposed to fuse LDR features based on Mamba. After that, an Inverse imaging Knowledge-Driven (IKD) cluster is employed to reconstruct the HDR feature, which is a cascade of IKD blocks at different scales. The IKD block consists of three parts: HDR information recovery, imaging parameter adjustment, and noise suppression, used to simulate the mathematical formula for multi-exposure HDR imaging. Experimental results demonstrate that the proposed MEIK model outperforms existing state-of-the-art models and exhibits strong scalability.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"169 ","pages":"Article 114177"},"PeriodicalIF":8.0,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-12DOI: 10.1016/j.engappai.2026.114150
Dmitry Yudin , Alexander Lazarev , Eva Bakaeva , Angelika Kochetkova , Alexey Kovalev , Aleksandr Panov
Recent progress in visual data analysis has significantly improved the ability of autonomous robots to understand their surroundings and perform complex tasks. This paper presents a modular method named Scene Graph-driven Reasoning for Action Planning (SG-RAPL) designed for high-level planning in dynamic environments, enabling adaptive control of humanoid robots. The method employs a three-dimensional (3D) scene graph to represent the environment and detect abnormal situations, while a large language model (LLM) translates natural-language commands into consecutive low-level actions. An original perceptual segmentation and tracking module constructs the scene graph in real time by providing instance segmentation, obstacle detection, and object pose estimation using data fusion with Augmented Reality University of Cordoba (ArUco) markers. The Planner module decomposes high-level tasks into subtasks such as navigation and object manipulation. Extensive experiments conducted on a manually collected and annotated dataset demonstrate that the proposed artificial intelligence-based approach efficiently plans complex actions in both virtual and real-world warehouse environments. The code and dataset of the proposed approach will be made publicly available.
{"title":"Scene graph-driven reasoning for action planning of humanoid robot","authors":"Dmitry Yudin , Alexander Lazarev , Eva Bakaeva , Angelika Kochetkova , Alexey Kovalev , Aleksandr Panov","doi":"10.1016/j.engappai.2026.114150","DOIUrl":"10.1016/j.engappai.2026.114150","url":null,"abstract":"<div><div>Recent progress in visual data analysis has significantly improved the ability of autonomous robots to understand their surroundings and perform complex tasks. This paper presents a modular method named Scene Graph-driven Reasoning for Action Planning (SG-RAPL) designed for high-level planning in dynamic environments, enabling adaptive control of humanoid robots. The method employs a three-dimensional (3D) scene graph to represent the environment and detect abnormal situations, while a large language model (LLM) translates natural-language commands into consecutive low-level actions. An original perceptual segmentation and tracking module constructs the scene graph in real time by providing instance segmentation, obstacle detection, and object pose estimation using data fusion with Augmented Reality University of Cordoba (ArUco) markers. The Planner module decomposes high-level tasks into subtasks such as navigation and object manipulation. Extensive experiments conducted on a manually collected and annotated dataset demonstrate that the proposed artificial intelligence-based approach efficiently plans complex actions in both virtual and real-world warehouse environments. The code and dataset of the proposed approach will be made publicly available.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"169 ","pages":"Article 114150"},"PeriodicalIF":8.0,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-12DOI: 10.1016/j.engappai.2026.114097
Shaokai Zheng , Peng Yan , Shengsu Ni , Daolei Wang
The loss of photovoltaic (PV) power due to environmental soiling presents a significant challenge to the PV power generation industry, making accurate prediction and estimation of power loss critical. However, most existing algorithmic models rely on traditional fusion methods to integrate PV images and environmental factors (time and irradiance) across modalities, limiting their ability to effectively utilize high-quality cross-modal information for downstream tasks. This paper proposes a novel cross-modal interactive fusion mechanism, Large Kernel Cross-Attention Fusion (LKCA Fusion), and introduces a new photovoltaic soiling loss (PVSL) prediction and estimation model, Large Kernel Fusion Solar Network (LKFSolarNet). LKFSolarNet utilizes an improved image backbone architecture to efficiently extract features from PV soiling images, followed by LKCA Fusion to perform cross-modal fusion between these image features and environmental factors. LKCA Fusion incorporates lightweight large kernel convolutions to enhance the model's ability to capture global information across different PV modalities and improve cross-modal interaction. Additionally, a Gradient Flow Enhanced branch is introduced to further strengthen the training of the image backbone network, enhancing overall model performance. Experiments on open-source Solar Panel Soiling Image dataset demonstrate that LKFSolarNet reduces Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) by 3.9% and 4.0%, respectively, in the prediction task and improves accuracy by 3.6% in the 16-class estimation task. Compared to the latest methods, LKFSolarNet reduces MAE and RMSE losses by 19.7% and 5.9%, respectively, and shows some improvement in estimation accuracy.
{"title":"A deep learning model for photovoltaic soiling loss prediction and estimation based on Large Kernel Cross-Attention Fusion","authors":"Shaokai Zheng , Peng Yan , Shengsu Ni , Daolei Wang","doi":"10.1016/j.engappai.2026.114097","DOIUrl":"10.1016/j.engappai.2026.114097","url":null,"abstract":"<div><div>The loss of photovoltaic (PV) power due to environmental soiling presents a significant challenge to the PV power generation industry, making accurate prediction and estimation of power loss critical. However, most existing algorithmic models rely on traditional fusion methods to integrate PV images and environmental factors (time and irradiance) across modalities, limiting their ability to effectively utilize high-quality cross-modal information for downstream tasks. This paper proposes a novel cross-modal interactive fusion mechanism, Large Kernel Cross-Attention Fusion (LKCA Fusion), and introduces a new photovoltaic soiling loss (PVSL) prediction and estimation model, Large Kernel Fusion Solar Network (LKFSolarNet). LKFSolarNet utilizes an improved image backbone architecture to efficiently extract features from PV soiling images, followed by LKCA Fusion to perform cross-modal fusion between these image features and environmental factors. LKCA Fusion incorporates lightweight large kernel convolutions to enhance the model's ability to capture global information across different PV modalities and improve cross-modal interaction. Additionally, a Gradient Flow Enhanced branch is introduced to further strengthen the training of the image backbone network, enhancing overall model performance. Experiments on open-source Solar Panel Soiling Image dataset demonstrate that LKFSolarNet reduces Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) by 3.9% and 4.0%, respectively, in the prediction task and improves accuracy by 3.6% in the 16-class estimation task. Compared to the latest methods, LKFSolarNet reduces MAE and RMSE losses by 19.7% and 5.9%, respectively, and shows some improvement in estimation accuracy.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"169 ","pages":"Article 114097"},"PeriodicalIF":8.0,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-12DOI: 10.1016/j.engappai.2026.114135
Kai Zhao , Yaoguo Dang , Shan Huang , Junjie Wang
The traditional grey prediction models are limited by insufficient ability to capture nonlinear trends, inefficient utilization of new information, and insufficient attenuation of old information noise in the grey system theory. This makes it difficult for the model to adapt to complex scene requirements. In response to the above limitations, this study proposes the high-order Logistic grey model with the external grey information. The core innovation of the model lies in: (1) The inherent internal grey information of the system and the external grey information supplemented by the environment are synergistically integrated. (2) The model adopts high-order Logistic accumulation operator, whose parameter range has been significantly expanded to . (3) The model can dynamically suppress long-term noise interference by introducing a time decay factor. Then, the mathematical derivation of the proposed accumulation operator and model properties was carried out. The rationality of external grey information has been theoretically proven. Finally, in the testing of eight typical scenarios both domestically and internationally (China’s air quality, carbon emissions, electricity consumption, foundation settlement, renewable energy consumption, hydropower installed capacity, the United States publication output and Poland’s renewable energy consumption), the model demonstrated strong robustness (such as a prediction error as low as 0.12% in renewable energy consumption forecasting). This gives an effective and useful tool for small-sample prediction in complex scenarios, and greatly facilitates the engineering applications of grey prediction theory.
{"title":"External grey information model and its performance","authors":"Kai Zhao , Yaoguo Dang , Shan Huang , Junjie Wang","doi":"10.1016/j.engappai.2026.114135","DOIUrl":"10.1016/j.engappai.2026.114135","url":null,"abstract":"<div><div>The traditional grey prediction models are limited by insufficient ability to capture nonlinear trends, inefficient utilization of new information, and insufficient attenuation of old information noise in the grey system theory. This makes it difficult for the model to adapt to complex scene requirements. In response to the above limitations, this study proposes the high-order Logistic grey model with the external grey information. The core innovation of the model lies in: (1) The inherent internal grey information of the system and the external grey information supplemented by the environment are synergistically integrated. (2) The model adopts high-order Logistic accumulation operator, whose parameter range has been significantly expanded to <span><math><mrow><mo>[</mo><mo>−</mo><mn>1</mn><mo>,</mo><mn>1</mn><mo>]</mo></mrow></math></span>. (3) The model can dynamically suppress long-term noise interference by introducing a time decay factor. Then, the mathematical derivation of the proposed accumulation operator and model properties was carried out. The rationality of external grey information has been theoretically proven. Finally, in the testing of eight typical scenarios both domestically and internationally (China’s air quality, carbon emissions, electricity consumption, foundation settlement, renewable energy consumption, hydropower installed capacity, the United States publication output and Poland’s renewable energy consumption), the model demonstrated strong robustness (such as a prediction error as low as 0.12% in renewable energy consumption forecasting). This gives an effective and useful tool for small-sample prediction in complex scenarios, and greatly facilitates the engineering applications of grey prediction theory.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"169 ","pages":"Article 114135"},"PeriodicalIF":8.0,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-12DOI: 10.1016/j.engappai.2026.114154
Zheng Zhang , Jun Wan , Jun Liu , Mingyang Zhou , Kezhong Lu , Claudio J. Tessone , Guoliang Chen , Hao Liao
As the main threat to the healthy development of major internet platforms, fraud is increasingly carried out in organized, group-based forms. Such collusive fraud activities are easier to obtain illegal benefits at a lower exposure risk. Recently, graph neural network-based fraud detection methods have attracted increasing attention due to their ability to solve camouflage problems in fraud scenarios. However fraudsters’ evolving camouflage strategies pose great challenges to the design of graph neural network (GNN)-based detection models. Furthermore, most existing GNN-based approaches focus on the representation learning of node-level and structural-level features, and often ignores the contextual high-order information of the fraud group where the fraud node is located. To address these limitations, this paper proposes a community context-driven and frequency-adaptive graph neural network (CCFA-GNN) for detecting collaborative camouflage review fraudsters. Specifically, a collusive reviewer graph is constructed to capture the deep collaborative relationship among fraudsters. Then we incorporate the high-order representation of collusive fraud into graph embedding learning for community context based on the maximization of the co-occurrence probability of fraudsters. Finally, a frequency-adaptive feature aggregation module is adopted to simultaneously leverage the high-frequency and low-frequency information of features to enhance the node embedding representation. Extensive experiments on real-world fraud datasets have been conducted to verify the effectiveness, robustness, and interpretability of the proposed model, rendering it highly suitable for fraud detection applications in e-commerce and financial transaction scenarios.
{"title":"Leveraging community context and frequency-adaptive aggregation for robust fraud detection","authors":"Zheng Zhang , Jun Wan , Jun Liu , Mingyang Zhou , Kezhong Lu , Claudio J. Tessone , Guoliang Chen , Hao Liao","doi":"10.1016/j.engappai.2026.114154","DOIUrl":"10.1016/j.engappai.2026.114154","url":null,"abstract":"<div><div>As the main threat to the healthy development of major internet platforms, fraud is increasingly carried out in organized, group-based forms. Such collusive fraud activities are easier to obtain illegal benefits at a lower exposure risk. Recently, graph neural network-based fraud detection methods have attracted increasing attention due to their ability to solve camouflage problems in fraud scenarios. However fraudsters’ evolving camouflage strategies pose great challenges to the design of graph neural network (GNN)-based detection models. Furthermore, most existing GNN-based approaches focus on the representation learning of node-level and structural-level features, and often ignores the contextual high-order information of the fraud group where the fraud node is located. To address these limitations, this paper proposes a community context-driven and frequency-adaptive graph neural network (CCFA-GNN) for detecting collaborative camouflage review fraudsters. Specifically, a collusive reviewer graph is constructed to capture the deep collaborative relationship among fraudsters. Then we incorporate the high-order representation of collusive fraud into graph embedding learning for community context based on the maximization of the co-occurrence probability of fraudsters. Finally, a frequency-adaptive feature aggregation module is adopted to simultaneously leverage the high-frequency and low-frequency information of features to enhance the node embedding representation. Extensive experiments on real-world fraud datasets have been conducted to verify the effectiveness, robustness, and interpretability of the proposed model, rendering it highly suitable for fraud detection applications in e-commerce and financial transaction scenarios.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"169 ","pages":"Article 114154"},"PeriodicalIF":8.0,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-12DOI: 10.1016/j.engappai.2026.114104
Peng You, Peng Chen, Xi Li, Ang Bian
Internet of Things (IoT) devices generate huge time series data during operation, crucial for device monitoring, fault prediction, and system security. However, these data often contain noise interference and exhibit complex spatio–temporal characteristics, posing significant challenges to anomaly detection. To address these challenges, this paper proposes an unsupervised anomaly detection model Gated Memory-guided Multi-scale spatio–temporal–spectral feature fusion network (GMMnet). GMMnet firstly leverages Positional Multi-scale Temporal-convolution and Multi-scale Spatio-spectral Self-attention to efficiently learn the temporal and spatio-spectral features of time series data, with an adaptive threshold filtering employed to mitigate high-frequency noise interference. By introducing Gated Memory-guided Fusion, GMMnet can accurately fuse the normal spatio–temporal–spectral features within the data, effectively guiding the model training process and significantly enhancing its generalization capability. Additionally, a Radial Basis Functions based Enhanced Reconstruction module is proposed to further improve GMMnet’s capability in detecting subtle anomalies. Extensive experiments on five publicly available IoT time series datasets demonstrate that the proposed method outperformed existing thirteen state-of-the-art baselines on nine metrics, with an average F1 score improvement of 17.72%.
{"title":"Gated Memory-Guided Multi-scale spatio–temporal–spectral feature fusion network for unsupervised Internet of Things time series anomaly detection","authors":"Peng You, Peng Chen, Xi Li, Ang Bian","doi":"10.1016/j.engappai.2026.114104","DOIUrl":"10.1016/j.engappai.2026.114104","url":null,"abstract":"<div><div>Internet of Things (IoT) devices generate huge time series data during operation, crucial for device monitoring, fault prediction, and system security. However, these data often contain noise interference and exhibit complex spatio–temporal characteristics, posing significant challenges to anomaly detection. To address these challenges, this paper proposes an unsupervised anomaly detection model <strong>G</strong>ated <strong>M</strong>emory-guided <strong>M</strong>ulti-scale spatio–temporal–spectral feature fusion <strong>net</strong>work (<strong>GMMnet</strong>). GMMnet firstly leverages Positional Multi-scale Temporal-convolution and Multi-scale Spatio-spectral Self-attention to efficiently learn the temporal and spatio-spectral features of time series data, with an adaptive threshold filtering employed to mitigate high-frequency noise interference. By introducing Gated Memory-guided Fusion, GMMnet can accurately fuse the normal spatio–temporal–spectral features within the data, effectively guiding the model training process and significantly enhancing its generalization capability. Additionally, a Radial Basis Functions based Enhanced Reconstruction module is proposed to further improve GMMnet’s capability in detecting subtle anomalies. Extensive experiments on five publicly available IoT time series datasets demonstrate that the proposed method outperformed existing thirteen state-of-the-art baselines on nine metrics, with an average F1 score improvement of 17.72%.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"169 ","pages":"Article 114104"},"PeriodicalIF":8.0,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-11DOI: 10.1016/j.engappai.2026.114095
Linfei Yin, Yufeng Liu
With wind energy increasing proportion in renewable energy structure, wind energy is already a backbone in low carbon energy structure. Short-term wind power prediction can assist the demand for real-time dispatching of wind farms and power grids. Regard to the problems of low prediction accuracy and long training time of existing prediction models for short-term wind power prediction, this study proposes a large-model bidirectional encoder representations from Transformer fusion quantum dual-stage attention bidirectional gated recurrent unit and diffusion method for short-term wind power prediction. The proposed method utilizes improved complete ensemble empirical mode decomposition with adaptive noise to decompose the wind power, and then the decomposed data are input into the quantum dual-stage attention bidirectional gated recurrent unit and quantum diffusion model for training prediction; then, the bidirectional encoder representations from Transformer provides final wind power prediction. Compared with 52 prediction algorithms, the average absolute error of the proposed method is more than 30.57% less. Furthermore, the addition of parameterized quantum circuits shortens training prediction time by nearly 25%.
{"title":"Bidirectional encoder representations from transformer fusion quantum dual-stage attention bidirectional gated recurrent unit and diffusion method for short-term wind power prediction","authors":"Linfei Yin, Yufeng Liu","doi":"10.1016/j.engappai.2026.114095","DOIUrl":"10.1016/j.engappai.2026.114095","url":null,"abstract":"<div><div>With wind energy increasing proportion in renewable energy structure, wind energy is already a backbone in low carbon energy structure. Short-term wind power prediction can assist the demand for real-time dispatching of wind farms and power grids. Regard to the problems of low prediction accuracy and long training time of existing prediction models for short-term wind power prediction, this study proposes a large-model bidirectional encoder representations from Transformer fusion quantum dual-stage attention bidirectional gated recurrent unit and diffusion method for short-term wind power prediction. The proposed method utilizes improved complete ensemble empirical mode decomposition with adaptive noise to decompose the wind power, and then the decomposed data are input into the quantum dual-stage attention bidirectional gated recurrent unit and quantum diffusion model for training prediction; then, the bidirectional encoder representations from Transformer provides final wind power prediction. Compared with 52 prediction algorithms, the average absolute error of the proposed method is more than 30.57% less. Furthermore, the addition of parameterized quantum circuits shortens training prediction time by nearly 25%.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"169 ","pages":"Article 114095"},"PeriodicalIF":8.0,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-11DOI: 10.1016/j.engappai.2026.114176
Peiyi Yang, Xuewei Wu, Peihan Wen
With the increasing complexity of technology and the interdisciplinary nature of research and development (R&D) activities, how to quickly acquire highly relevant and interpretable patent knowledge has become an important issue in engineering innovation. In view of the high information density, abundance of specialized terminology, and complex structure inherent in patent texts, as well as the stringent requirements for accuracy and interpretability in knowledge recommendation within R&D scenarios, a technology knowledge recommendation framework based on patent texts was proposed, which integrates knowledge ontology, graph representation learning, and Retrieval-Augmented Generation (RAG). Firstly, a hierarchical fine-grained ontology model for patent technology knowledge was constructed. On this basis, efficient knowledge extraction and knowledge graph (KG) construction were realized through two-stage prompt engineering. Secondly, a KG representation learning method combining semantic information and structural information is proposed, which integrates semantic enhanced representation and relational graph convolutional networks to realize technology knowledge mining. Finally, the ontology-driven meta-path generation strategy is introduced and integrated with RAG, the reasoning path is generated through large language model (LLM), and the pruning mechanism based on LLM score is introduced to augment the relevance and interpretability of the recommended content. Case-based experiments demonstrate that the proposed method outperforms baseline approaches and provides technical support for the reuse and innovation of patent knowledge in R&D scenarios.
{"title":"Patent technology knowledge recommendation by integrating large language models and knowledge graphs","authors":"Peiyi Yang, Xuewei Wu, Peihan Wen","doi":"10.1016/j.engappai.2026.114176","DOIUrl":"10.1016/j.engappai.2026.114176","url":null,"abstract":"<div><div>With the increasing complexity of technology and the interdisciplinary nature of research and development (R&D) activities, how to quickly acquire highly relevant and interpretable patent knowledge has become an important issue in engineering innovation. In view of the high information density, abundance of specialized terminology, and complex structure inherent in patent texts, as well as the stringent requirements for accuracy and interpretability in knowledge recommendation within R&D scenarios, a technology knowledge recommendation framework based on patent texts was proposed, which integrates knowledge ontology, graph representation learning, and Retrieval-Augmented Generation (RAG). Firstly, a hierarchical fine-grained ontology model for patent technology knowledge was constructed. On this basis, efficient knowledge extraction and knowledge graph (KG) construction were realized through two-stage prompt engineering. Secondly, a KG representation learning method combining semantic information and structural information is proposed, which integrates semantic enhanced representation and relational graph convolutional networks to realize technology knowledge mining. Finally, the ontology-driven meta-path generation strategy is introduced and integrated with RAG, the reasoning path is generated through large language model (LLM), and the pruning mechanism based on LLM score is introduced to augment the relevance and interpretability of the recommended content. Case-based experiments demonstrate that the proposed method outperforms baseline approaches and provides technical support for the reuse and innovation of patent knowledge in R&D scenarios.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"169 ","pages":"Article 114176"},"PeriodicalIF":8.0,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-11DOI: 10.1016/j.engappai.2026.114141
Shilong Wang, Hai Cui, Yanchen Qu, Xiaobo Li, Yijia Zhang
Drug-drug interactions can result in severe adverse reactions and pose substantial threats to public health. Therefore, accurately predicting potential interactions between drugs has become a critical research direction. Recently, network-based approaches have created new opportunities in this field. However, existing methods still struggle to generalize to unseen drugs and are susceptible to structural perturbations, leading to limited robustness. Moreover, most methods rely on shallow or static fusion of heterogeneous drug representations, lacking mechanisms to adaptively capture complementary structural and semantic information. To address these issues, we propose a dual-channel heterogeneous graph framework that performs gated feature fusion between molecular graph representations and biomedical knowledge graph representations, while incorporating multi-view contrastive learning to enhance prediction performance. The proposed framework leverages a pretrained heterogeneous graph neural network to jointly model structural and semantic dependencies, thereby improving representation quality and model generalization. In addition, a multi-view contrastive learning strategy is introduced to further strengthen the discriminative power and robustness of drug representations. Experimental results demonstrate that our method consistently outperforms state-of-the-art models across all benchmark datasets. Further case studies confirm its effectiveness in predicting drug interaction relationships, highlighting its potential to provide reliable computational support for clinical decision-making and drug discovery.
{"title":"Dual-channel heterogeneous graph framework with multi-view contrastive learning for drug–drug interaction prediction","authors":"Shilong Wang, Hai Cui, Yanchen Qu, Xiaobo Li, Yijia Zhang","doi":"10.1016/j.engappai.2026.114141","DOIUrl":"10.1016/j.engappai.2026.114141","url":null,"abstract":"<div><div>Drug-drug interactions can result in severe adverse reactions and pose substantial threats to public health. Therefore, accurately predicting potential interactions between drugs has become a critical research direction. Recently, network-based approaches have created new opportunities in this field. However, existing methods still struggle to generalize to unseen drugs and are susceptible to structural perturbations, leading to limited robustness. Moreover, most methods rely on shallow or static fusion of heterogeneous drug representations, lacking mechanisms to adaptively capture complementary structural and semantic information. To address these issues, we propose a dual-channel heterogeneous graph framework that performs gated feature fusion between molecular graph representations and biomedical knowledge graph representations, while incorporating multi-view contrastive learning to enhance prediction performance. The proposed framework leverages a pretrained heterogeneous graph neural network to jointly model structural and semantic dependencies, thereby improving representation quality and model generalization. In addition, a multi-view contrastive learning strategy is introduced to further strengthen the discriminative power and robustness of drug representations. Experimental results demonstrate that our method consistently outperforms state-of-the-art models across all benchmark datasets. Further case studies confirm its effectiveness in predicting drug interaction relationships, highlighting its potential to provide reliable computational support for clinical decision-making and drug discovery.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"169 ","pages":"Article 114141"},"PeriodicalIF":8.0,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-11DOI: 10.1016/j.engappai.2026.114063
Junchao Zhou , Ao Chen , Shangwu Huang , Jianjie Gao , Haiping Du
Accelerating global urbanization has intensified the demand for efficient and sustainable transportation solutions in high-density areas. Traditional ground-based transit systems face congestion and pollution challenges in spatially constrained regions. Against this backdrop, the Straddle-type Monorail System (SMS), distinguished by its lightweight structure, lower infrastructure costs, and unique elevated spatial efficiency, emerges as a critical option for optimizing urban commuting networks. However, a fundamental challenge for Straddle-type Monorail Vehicle (SMV) operational safety is lateral shimmy vibration instability. Conventional dynamic modelling approaches struggle to predict shimmy bifurcation boundaries effectively due to computational inefficiency and poor parametric generalization. To address these limitations, this research proposes a novel meta-learning framework named MAML-CNN-LSTM-Attention (M-CLA) for few-shot critical speed prediction, which integrates Model-Agnostic Meta-Learning (MAML), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Attention mechanism. Trained on a 7-DOF vehicle-track coupling model, the M-CLA framework processes lateral displacement and velocity time-series data to achieve 99.67% prediction accuracy for the critical speed under few-shot conditions. It demonstrates rapid adaptation and superior generalization across scenarios with minimal data, offering a practical AI tool for enhancing SMS safety, reducing maintenance costs, and preventing derailments. The framework rapidly adapts to new operational scenarios with minimal data, outperforming traditional deep learning methods in both prediction accuracy and cross-condition generalization. It provides infrastructure managers with an Artificial Intelligence (AI)-driven tool for dynamic optimization and safety evaluation of SMS, effectively contributing to derailment prevention, maintenance cost reduction, and enhanced operational safety across diverse urban rail transit environments.
{"title":"Adaptive critical speed prediction for straddle-type monorail operational safety: A meta-learning framework with few-shot deployment","authors":"Junchao Zhou , Ao Chen , Shangwu Huang , Jianjie Gao , Haiping Du","doi":"10.1016/j.engappai.2026.114063","DOIUrl":"10.1016/j.engappai.2026.114063","url":null,"abstract":"<div><div>Accelerating global urbanization has intensified the demand for efficient and sustainable transportation solutions in high-density areas. Traditional ground-based transit systems face congestion and pollution challenges in spatially constrained regions. Against this backdrop, the Straddle-type Monorail System (SMS), distinguished by its lightweight structure, lower infrastructure costs, and unique elevated spatial efficiency, emerges as a critical option for optimizing urban commuting networks. However, a fundamental challenge for Straddle-type Monorail Vehicle (SMV) operational safety is lateral shimmy vibration instability. Conventional dynamic modelling approaches struggle to predict shimmy bifurcation boundaries effectively due to computational inefficiency and poor parametric generalization. To address these limitations, this research proposes a novel meta-learning framework named MAML-CNN-LSTM-Attention (M-CLA) for few-shot critical speed prediction, which integrates Model-Agnostic Meta-Learning (MAML), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Attention mechanism. Trained on a 7-DOF vehicle-track coupling model, the M-CLA framework processes lateral displacement and velocity time-series data to achieve 99.67% prediction accuracy for the critical speed under few-shot conditions. It demonstrates rapid adaptation and superior generalization across scenarios with minimal data, offering a practical AI tool for enhancing SMS safety, reducing maintenance costs, and preventing derailments. The framework rapidly adapts to new operational scenarios with minimal data, outperforming traditional deep learning methods in both prediction accuracy and cross-condition generalization. It provides infrastructure managers with an Artificial Intelligence (AI)-driven tool for dynamic optimization and safety evaluation of SMS, effectively contributing to derailment prevention, maintenance cost reduction, and enhanced operational safety across diverse urban rail transit environments.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"169 ","pages":"Article 114063"},"PeriodicalIF":8.0,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}