首页 > 最新文献

Applied Intelligence最新文献

英文 中文
From data to actionable knowledge: AI-AR integration framework for industrial knowledge management 从数据到可操作的知识:工业知识管理的AI-AR集成框架
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-10 DOI: 10.1007/s10489-026-07184-3
Sara Scheffer, Wanting Mao, Arnab Majumdar

Industrial knowledge management (KM) remains highly fragmented, with knowledge capture, structuring, and application often treated as isolated activities. Conventional approaches depend on static documentation and experience-based practices, which limit their responsiveness in dynamic, operational environments. In this paper, we design a comprehensive framework for the integration of artificial intelligence (AI) and augmented reality (AR) in industrial maintenance and diagnostics. The framework leverages AI techniques, including natural language processing (NLP) for extracting and structuring domain knowledge and machine learning (ML) models for predictive fault classification. These AI capabilities are seamlessly combined with AR technologies to deliver immersive, context-aware, in-situ task guidance, thereby enhancing decision-making, reducing downtime, and supporting efficient, knowledge-based maintenance processes. A case study is employed to demonstrate the framework’s feasibility by developing a prototype system deployed in a maintenance setting. The AI module clusters maintenance actions and predicts task categories, while the AR component renders step-by-step maintenance instructions anchored in the physical workspace. This integration enables the transition from unstructured, digital maintenance records to step-by-step in-situ repair guidance, reducing reliance on expert memory or static manuals. The results show that combining AI-driven text analytics with AR-based visualisation creates a cohesive knowledge workflow that improves operational efficiency. The proposed framework offers a scalable approach to embedding KM into frontline industrial routines, laying the foundation for more adaptive, technician-centred knowledge systems.

工业知识管理(KM)仍然是高度碎片化的,知识获取、结构和应用通常被视为孤立的活动。传统方法依赖于静态文档和基于经验的实践,这限制了它们在动态操作环境中的响应能力。在本文中,我们设计了一个综合框架,将人工智能(AI)和增强现实(AR)集成到工业维护和诊断中。该框架利用人工智能技术,包括用于提取和构建领域知识的自然语言处理(NLP)和用于预测故障分类的机器学习(ML)模型。这些人工智能功能与AR技术无缝结合,提供身临其境的、上下文感知的、现场任务指导,从而增强决策,减少停机时间,并支持高效的、基于知识的维护流程。通过开发在维护环境中部署的原型系统,采用案例研究来演示框架的可行性。AI模块对维护操作进行分组并预测任务类别,而AR组件则在物理工作空间中呈现分步维护说明。这种集成可以从非结构化的数字维护记录过渡到逐步的原位维修指导,减少对专家记忆或静态手册的依赖。结果表明,将人工智能驱动的文本分析与基于ar的可视化相结合,可以创建一个内聚的知识工作流,从而提高操作效率。提出的框架提供了一种可扩展的方法,将知识管理嵌入到一线工业惯例中,为更具适应性的、以技术人员为中心的知识系统奠定基础。
{"title":"From data to actionable knowledge: AI-AR integration framework for industrial knowledge management","authors":"Sara Scheffer,&nbsp;Wanting Mao,&nbsp;Arnab Majumdar","doi":"10.1007/s10489-026-07184-3","DOIUrl":"10.1007/s10489-026-07184-3","url":null,"abstract":"<div>\u0000 \u0000 <p>Industrial knowledge management (KM) remains highly fragmented, with knowledge capture, structuring, and application often treated as isolated activities. Conventional approaches depend on static documentation and experience-based practices, which limit their responsiveness in dynamic, operational environments. In this paper, we design a comprehensive framework for the integration of artificial intelligence (AI) and augmented reality (AR) in industrial maintenance and diagnostics. The framework leverages AI techniques, including natural language processing (NLP) for extracting and structuring domain knowledge and machine learning (ML) models for predictive fault classification. These AI capabilities are seamlessly combined with AR technologies to deliver immersive, context-aware, in-situ task guidance, thereby enhancing decision-making, reducing downtime, and supporting efficient, knowledge-based maintenance processes. A case study is employed to demonstrate the framework’s feasibility by developing a prototype system deployed in a maintenance setting. The AI module clusters maintenance actions and predicts task categories, while the AR component renders step-by-step maintenance instructions anchored in the physical workspace. This integration enables the transition from unstructured, digital maintenance records to step-by-step in-situ repair guidance, reducing reliance on expert memory or static manuals. The results show that combining AI-driven text analytics with AR-based visualisation creates a cohesive knowledge workflow that improves operational efficiency. The proposed framework offers a scalable approach to embedding KM into frontline industrial routines, laying the foundation for more adaptive, technician-centred knowledge systems.</p>\u0000 </div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 5","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-026-07184-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147441288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning temporal and correlated features from multivariate time series for auxiliary diagnosis of lumbar disc herniation 从多变量时间序列中学习时间和相关特征以辅助诊断腰椎间盘突出症
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-09 DOI: 10.1007/s10489-026-07173-6
Xian Yang, Yahui Zhao, Fei Yin, Zhenguo Zhang

Patients with lumbar disc herniation exhibit distinctive gait patterns, making gait-based analysis a promising approach for disease detection. Gait data are periodic multivariate time series (MTS), where discriminative information is often distributed across multiple temporal scales. However, existing methods primarily focus on intra-sequence temporal features and overlook inter-sequence correlations and multi-period characteristics. To address these issues, we propose a Temporal and Correlated features extraction Network (TCGNet). Fast Fourier Transform (FFT) is used to identify dominant frequencies and decompose MTS into multi-period subsequences. A dual-branch architecture extracts intra-sequence temporal features via convolution and inter-sequence correlated features via adaptive graph convolution, followed by feature fusion. Experiments on nine benchmark datasets demonstrate superior classification performance, and gait-based lumbar disc herniation diagnosis further validates the effectiveness of TCGNet.

腰椎间盘突出症患者表现出独特的步态模式,使基于步态的分析成为一种有前途的疾病检测方法。步态数据是周期性的多变量时间序列(MTS),其中判别信息通常分布在多个时间尺度上。然而,现有的方法主要关注序列内的时间特征,而忽略了序列间的相关性和多周期特征。为了解决这些问题,我们提出了一个时间和相关特征提取网络(TCGNet)。快速傅里叶变换(FFT)用于识别主频率,并将MTS分解成多周期子序列。双分支结构通过卷积提取序列内时间特征,通过自适应图卷积提取序列间相关特征,然后进行特征融合。在9个基准数据集上的实验显示了优异的分类性能,基于步态的腰椎间盘突出症诊断进一步验证了TCGNet的有效性。
{"title":"Learning temporal and correlated features from multivariate time series for auxiliary diagnosis of lumbar disc herniation","authors":"Xian Yang,&nbsp;Yahui Zhao,&nbsp;Fei Yin,&nbsp;Zhenguo Zhang","doi":"10.1007/s10489-026-07173-6","DOIUrl":"10.1007/s10489-026-07173-6","url":null,"abstract":"<p>Patients with lumbar disc herniation exhibit distinctive gait patterns, making gait-based analysis a promising approach for disease detection. Gait data are periodic multivariate time series (MTS), where discriminative information is often distributed across multiple temporal scales. However, existing methods primarily focus on intra-sequence temporal features and overlook inter-sequence correlations and multi-period characteristics. To address these issues, we propose a Temporal and Correlated features extraction Network (TCGNet). Fast Fourier Transform (FFT) is used to identify dominant frequencies and decompose MTS into multi-period subsequences. A dual-branch architecture extracts intra-sequence temporal features via convolution and inter-sequence correlated features via adaptive graph convolution, followed by feature fusion. Experiments on nine benchmark datasets demonstrate superior classification performance, and gait-based lumbar disc herniation diagnosis further validates the effectiveness of TCGNet.</p>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 5","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147440861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Infrared small target detection using multi-scale attention and dilated separable convolution 基于多尺度注意和扩展可分卷积的红外小目标检测
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-09 DOI: 10.1007/s10489-026-07181-6
Wenjuan Tang, Qun Dai, Yuning Zhu, Rui Ma

Detecting small infrared targets is a critical challenge in computer vision, focusing on identifying and localizing minimal-pixel-sized targets within infrared imagery. This task faces significant challenges due to the extremely small size of the targets, variability in target dimensions and shapes across diverse scenes, intricate background complexities, and instances of target occlusion. In this study, we introduce an enhanced U-Net model featuring three novel modules: the Residual Coordinate Attention Block (RCA-Block) integrates coordinate attention and residual structures to enhance feature representation; the Dynamic Context-aware Multi-scale Fusion Module (DCMFM) dynamically fuses multi-scale features based on target characteristics; and the Multi-dilation Depthwise Separable Convolutional Module (MDSCM) employs multi-dilation depthwise separable convolutions to capture spatial features across varying receptive fields. Experiments demonstrate that the proposed model synergizes multi-scale feature fusion and spatial information enhancement, achieving superior accuracy in small target detection, robust noise resistance.

红外小目标检测是计算机视觉领域的一个关键挑战,其重点是红外图像中最小像素级目标的识别和定位。由于目标尺寸极小,目标尺寸和形状在不同场景下的可变性,复杂的背景复杂性以及目标遮挡的实例,该任务面临着巨大的挑战。在本研究中,我们引入了一个包含三个新模块的增强U-Net模型:残差坐标注意块(RCA-Block)集成了坐标注意和残差结构以增强特征表征;基于目标特征的动态上下文感知多尺度融合模块(DCMFM)动态融合多尺度特征;多重扩张深度可分离卷积模块(MDSCM)采用多重扩张深度可分离卷积来捕捉不同感受野的空间特征。实验表明,该模型将多尺度特征融合和空间信息增强相结合,在小目标检测中具有较高的精度和较强的抗噪能力。
{"title":"Infrared small target detection using multi-scale attention and dilated separable convolution","authors":"Wenjuan Tang,&nbsp;Qun Dai,&nbsp;Yuning Zhu,&nbsp;Rui Ma","doi":"10.1007/s10489-026-07181-6","DOIUrl":"10.1007/s10489-026-07181-6","url":null,"abstract":"<div><p>Detecting small infrared targets is a critical challenge in computer vision, focusing on identifying and localizing minimal-pixel-sized targets within infrared imagery. This task faces significant challenges due to the extremely small size of the targets, variability in target dimensions and shapes across diverse scenes, intricate background complexities, and instances of target occlusion. In this study, we introduce an enhanced U-Net model featuring three novel modules: the Residual Coordinate Attention Block (RCA-Block) integrates coordinate attention and residual structures to enhance feature representation; the Dynamic Context-aware Multi-scale Fusion Module (DCMFM) dynamically fuses multi-scale features based on target characteristics; and the Multi-dilation Depthwise Separable Convolutional Module (MDSCM) employs multi-dilation depthwise separable convolutions to capture spatial features across varying receptive fields. Experiments demonstrate that the proposed model synergizes multi-scale feature fusion and spatial information enhancement, achieving superior accuracy in small target detection, robust noise resistance.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 5","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147440930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STDGFN: A spatio-temporal dual-graph fusion network for traffic flow prediction 交通流预测的时空双图融合网络
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-09 DOI: 10.1007/s10489-026-07170-9
Ruotian Ye, Yitong Tao, Qingjian Ni

Accurate traffic flow forecasting serves as a critical technical foundation for building intelligent transportation systems and enhancing road network operational efficiency. Traffic flow patterns exhibit significant multi-time-scale characteristics, encompassing both long-term macroscopic trends and short-term fluctuations. The dynamic interactions across these scales constitute a core challenge in traffic prediction. Although a few studies have considered multi-scale feature extraction, they often adopt fixed paradigms to process information from different temporal scales, failing to effectively model the dynamic correlation mechanisms among local temporal patterns at multiple scales as traffic conditions evolve. Furthermore, dynamic spatial dependency modeling based on graph convolution is often hampered by the over-smoothing of neighborhood information, which weakens the discriminative power of critical spatio-temporal features and limits the model’s adaptability to complex traffic scenarios. To address these issues, we propose a Spatio-Temporal Dual-Graph Fusion Network (STDGFN). The model first partitions traffic flow sequences based on multi-scale time windows, encoding each segment into a temporal pattern representation that captures local contextual information. It then employs stacked Spatio-Temporal Feature Extraction Blocks (STFEB) to capture complex spatio-temporal dependencies at varying granularities, including dynamic interactions among multi-scale temporal patterns and spatial correlations between nodes. To mitigate the over-smoothing caused by deep graph convolution, we design a Spatial-Aware Gated Pattern Enhancer (SAGPE) that integrates node-level spatial attributes into a gating mechanism, effectively reinforcing the spatial heterogeneity of each node. Experimental results demonstrate that STDGFN outperforms existing state-of-the-art baseline methods on five real-world traffic datasets.

准确的交通流预测是构建智能交通系统、提高路网运行效率的关键技术基础。交通流模式表现出显著的多时间尺度特征,既包括长期宏观趋势,也包括短期波动。这些尺度之间的动态相互作用构成了交通预测的核心挑战。尽管有一些研究考虑了多尺度特征提取,但它们往往采用固定的范式来处理不同时间尺度的信息,无法有效地模拟多尺度下局部时间模式随交通状况演变的动态关联机制。此外,基于图卷积的动态空间依赖建模常常受到邻域信息过度平滑的阻碍,从而削弱了关键时空特征的判别能力,限制了模型对复杂交通场景的适应性。为了解决这些问题,我们提出了一个时空双图融合网络(STDGFN)。该模型首先基于多尺度时间窗划分交通流序列,将每个片段编码为捕获本地上下文信息的时间模式表示。然后,它采用堆叠时空特征提取块(STFEB)来捕获不同粒度的复杂时空依赖关系,包括多尺度时间模式之间的动态相互作用和节点之间的空间相关性。为了减轻深度图卷积引起的过度平滑,我们设计了一个空间感知门控模式增强器(SAGPE),它将节点级的空间属性集成到一个门控机制中,有效地增强了每个节点的空间异质性。实验结果表明,STDGFN在五个真实交通数据集上优于现有的最先进的基线方法。
{"title":"STDGFN: A spatio-temporal dual-graph fusion network for traffic flow prediction","authors":"Ruotian Ye,&nbsp;Yitong Tao,&nbsp;Qingjian Ni","doi":"10.1007/s10489-026-07170-9","DOIUrl":"10.1007/s10489-026-07170-9","url":null,"abstract":"<div>\u0000 \u0000 <p>Accurate traffic flow forecasting serves as a critical technical foundation for building intelligent transportation systems and enhancing road network operational efficiency. Traffic flow patterns exhibit significant multi-time-scale characteristics, encompassing both long-term macroscopic trends and short-term fluctuations. The dynamic interactions across these scales constitute a core challenge in traffic prediction. Although a few studies have considered multi-scale feature extraction, they often adopt fixed paradigms to process information from different temporal scales, failing to effectively model the dynamic correlation mechanisms among local temporal patterns at multiple scales as traffic conditions evolve. Furthermore, dynamic spatial dependency modeling based on graph convolution is often hampered by the over-smoothing of neighborhood information, which weakens the discriminative power of critical spatio-temporal features and limits the model’s adaptability to complex traffic scenarios. To address these issues, we propose a Spatio-Temporal Dual-Graph Fusion Network (STDGFN). The model first partitions traffic flow sequences based on multi-scale time windows, encoding each segment into a temporal pattern representation that captures local contextual information. It then employs stacked Spatio-Temporal Feature Extraction Blocks (STFEB) to capture complex spatio-temporal dependencies at varying granularities, including dynamic interactions among multi-scale temporal patterns and spatial correlations between nodes. To mitigate the over-smoothing caused by deep graph convolution, we design a Spatial-Aware Gated Pattern Enhancer (SAGPE) that integrates node-level spatial attributes into a gating mechanism, effectively reinforcing the spatial heterogeneity of each node. Experimental results demonstrate that STDGFN outperforms existing state-of-the-art baseline methods on five real-world traffic datasets.</p>\u0000 </div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 5","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147440864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Region feature enhancement and multi-view diffusion graph convolutional network for traffic accident risk prediction 基于区域特征增强和多视图扩散图卷积网络的交通事故风险预测
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-09 DOI: 10.1007/s10489-026-07140-1
Xinlu Zong, Jiawei Guo, Siyu Dong

Traffic accident prediction is a crucial component of Intelligent Transportation Systems (ITS), playing a vital role in reducing accident rates and enhancing urban traffic safety. However, accurately forecasting accident risk remains challenging due to the complex spatial–temporal dependencies and multiple external influencing factors. To address this issue, this study proposes a Region Feature Enhancement and Multi-view Diffusion Graph Convolutional Network (RFE-MDGCN) for traffic accident risk prediction. The model introduces a Region Feature Enhancement (RFE) module that employs channel-level and spatial-level attention mechanisms to capture spatial dependencies and highlight region-specific risk features. A Multi-view Diffusion Graph Convolutional Network is further designed to model similarity and semantic relationships among urban regions through a graph diffusion process, thereby enriching node representations with high-order neighbor information. To capture temporal dependencies, a Spatial–Temporal Fusion module integrating a Long Short-Term Memory–Transformer Encoder (LT Encoder) is developed to dynamically learn both short-term continuity and long-term periodicity of accident risk. Experimental results on two real-world datasets, New York City (NYC) and Chicago (ZJG), show that the proposed method achieves improvements of 11.45% in Recall and 16.10% in Mean Average Precision (MAP) over state-of-the-art baselines, demonstrating its robustness and effectiveness for traffic accident risk prediction.

交通事故预测是智能交通系统的重要组成部分,对降低事故率、提高城市交通安全起着至关重要的作用。然而,由于复杂的时空依赖关系和多种外部影响因素,准确预测事故风险仍然具有挑战性。为了解决这一问题,本研究提出了一种区域特征增强和多视图扩散图卷积网络(RFE-MDGCN)用于交通事故风险预测。该模型引入了区域特征增强(RFE)模块,该模块采用通道级和空间级注意机制来捕获空间依赖性并突出区域特定的风险特征。进一步设计了多视图扩散图卷积网络,通过图扩散过程对城市区域之间的相似度和语义关系进行建模,从而用高阶邻居信息丰富节点表示。为了捕获时间依赖性,开发了一个集成了长短期记忆-变压器编码器(LT编码器)的时空融合模块,以动态学习事故风险的短期连续性和长期周期性。在纽约市(NYC)和芝加哥(ZJG)两个真实数据集上的实验结果表明,与最先进的基线相比,该方法的召回率提高了11.45%,平均精度(MAP)提高了16.10%,证明了其对交通事故风险预测的鲁棒性和有效性。
{"title":"Region feature enhancement and multi-view diffusion graph convolutional network for traffic accident risk prediction","authors":"Xinlu Zong,&nbsp;Jiawei Guo,&nbsp;Siyu Dong","doi":"10.1007/s10489-026-07140-1","DOIUrl":"10.1007/s10489-026-07140-1","url":null,"abstract":"<div>\u0000 \u0000 <p>Traffic accident prediction is a crucial component of Intelligent Transportation Systems (ITS), playing a vital role in reducing accident rates and enhancing urban traffic safety. However, accurately forecasting accident risk remains challenging due to the complex spatial–temporal dependencies and multiple external influencing factors. To address this issue, this study proposes a Region Feature Enhancement and Multi-view Diffusion Graph Convolutional Network (RFE-MDGCN) for traffic accident risk prediction. The model introduces a Region Feature Enhancement (RFE) module that employs channel-level and spatial-level attention mechanisms to capture spatial dependencies and highlight region-specific risk features. A Multi-view Diffusion Graph Convolutional Network is further designed to model similarity and semantic relationships among urban regions through a graph diffusion process, thereby enriching node representations with high-order neighbor information. To capture temporal dependencies, a Spatial–Temporal Fusion module integrating a Long Short-Term Memory–Transformer Encoder (LT Encoder) is developed to dynamically learn both short-term continuity and long-term periodicity of accident risk. Experimental results on two real-world datasets, New York City (NYC) and Chicago (ZJG), show that the proposed method achieves improvements of 11.45% in Recall and 16.10% in Mean Average Precision (MAP) over state-of-the-art baselines, demonstrating its robustness and effectiveness for traffic accident risk prediction.</p>\u0000 </div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 5","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147440932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GTBAD: GVSAO-Transformer-BiLSTM-based time-series anomaly detection for photovoltaic power generation GTBAD:基于gvsao - transformer - bilstm的光伏发电时序异常检测
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-09 DOI: 10.1007/s10489-026-07171-8
Lin Zhu, Shaojuan Ma, Changlin Xu, Xinyi Xu, Haichuan Du

With the acceleration of the global energy transition, photovoltaic (PV) power generation, as an essential component of clean energy, has become increasingly important in promoting green and low-carbon development and achieving sustainable energy goals. However, the time-series data generated by PV power generation systems often contains outliers caused by equipment failures or environmental changes. Previous studies have shown that reconstruction-based anomaly detection methods are deficient in simultaneously modeling the global dependencies and local dynamics of time-series data, and are susceptible to model overfitting. Against the background of rapid expansion in PV installed capacity and sustained growth in monitoring data, coupled with a scarcity of precise annotations, unsupervised anomaly detection in multivariate time series has become a critical issue for ensuring the safe and economical operation of large-scale PV plants. For this reason, an unsupervised anomaly detection model based on GTBAD (GVSAO-Transformer-BiLSTM for anomaly detection) is proposed in this article. The model introduces an improved Snow Ablation Optimization algorithm (GVSAO) during the offline training phase to optimize model parameters, thereby enhancing global search capability and improving the efficiency of parameter optimization. The encoder employs a Transformer module with positional encoding to capture global dependencies in the time series, and regularization techniques are incorporated at each layer to mitigate overfitting. During decoding, a BiLSTM module is used to further exploit local temporal patterns via bidirectional recurrent modeling, enabling comprehensive utilization of contextual information in PV data. Finally, the reconstructed sequence is generated through a fully connected output layer, and anomaly detection is performed by comparing the reconstruction error against a predefined threshold. In this paper, we conduct comparison experiments between the GTBAD model and existing anomaly detection methods on four public datasets and verify the effectiveness of each module through ablation experiments. In addition, this study incorporates attention visualization and variable-level attribution analysis to enhance model interpretability and improve its operational usability. The experimental results show the effectiveness and superiority of the GTBAD model in PV anomaly detection.

随着全球能源转型进程的加快,光伏发电作为清洁能源的重要组成部分,在推动绿色低碳发展、实现能源可持续发展目标方面发挥着越来越重要的作用。然而,光伏发电系统产生的时间序列数据往往含有因设备故障或环境变化而产生的异常值。以往的研究表明,基于重构的异常检测方法不能同时模拟时间序列数据的全局依赖性和局部动态性,容易出现模型过拟合的问题。在光伏发电装机容量快速增长和监测数据持续增长的背景下,再加上缺乏精确的标注,多变量时间序列的无监督异常检测已成为确保大型光伏电站安全经济运行的关键问题。为此,本文提出了一种基于GTBAD (GVSAO-Transformer-BiLSTM For anomaly detection)的无监督异常检测模型。该模型在离线训练阶段引入改进的积雪消融优化算法(GVSAO)对模型参数进行优化,增强了全局搜索能力,提高了参数优化效率。编码器采用具有位置编码的Transformer模块来捕获时间序列中的全局依赖关系,并且在每一层都采用正则化技术来减轻过拟合。在解码过程中,BiLSTM模块通过双向循环建模进一步挖掘局部时间模式,从而全面利用PV数据中的上下文信息。最后,通过全连接输出层生成重构序列,并通过将重构误差与预定义阈值进行比较来进行异常检测。本文将GTBAD模型与现有异常检测方法在4个公共数据集上进行对比实验,并通过烧蚀实验验证各模块的有效性。此外,本研究还将注意力可视化和变量水平归因分析相结合,增强模型的可解释性,提高模型的操作可用性。实验结果表明了GTBAD模型在PV异常检测中的有效性和优越性。
{"title":"GTBAD: GVSAO-Transformer-BiLSTM-based time-series anomaly detection for photovoltaic power generation","authors":"Lin Zhu,&nbsp;Shaojuan Ma,&nbsp;Changlin Xu,&nbsp;Xinyi Xu,&nbsp;Haichuan Du","doi":"10.1007/s10489-026-07171-8","DOIUrl":"10.1007/s10489-026-07171-8","url":null,"abstract":"<div><p>With the acceleration of the global energy transition, photovoltaic (PV) power generation, as an essential component of clean energy, has become increasingly important in promoting green and low-carbon development and achieving sustainable energy goals. However, the time-series data generated by PV power generation systems often contains outliers caused by equipment failures or environmental changes. Previous studies have shown that reconstruction-based anomaly detection methods are deficient in simultaneously modeling the global dependencies and local dynamics of time-series data, and are susceptible to model overfitting. Against the background of rapid expansion in PV installed capacity and sustained growth in monitoring data, coupled with a scarcity of precise annotations, unsupervised anomaly detection in multivariate time series has become a critical issue for ensuring the safe and economical operation of large-scale PV plants. For this reason, an unsupervised anomaly detection model based on GTBAD (GVSAO-Transformer-BiLSTM for anomaly detection) is proposed in this article. The model introduces an improved Snow Ablation Optimization algorithm (GVSAO) during the offline training phase to optimize model parameters, thereby enhancing global search capability and improving the efficiency of parameter optimization. The encoder employs a Transformer module with positional encoding to capture global dependencies in the time series, and regularization techniques are incorporated at each layer to mitigate overfitting. During decoding, a BiLSTM module is used to further exploit local temporal patterns via bidirectional recurrent modeling, enabling comprehensive utilization of contextual information in PV data. Finally, the reconstructed sequence is generated through a fully connected output layer, and anomaly detection is performed by comparing the reconstruction error against a predefined threshold. In this paper, we conduct comparison experiments between the GTBAD model and existing anomaly detection methods on four public datasets and verify the effectiveness of each module through ablation experiments. In addition, this study incorporates attention visualization and variable-level attribution analysis to enhance model interpretability and improve its operational usability. The experimental results show the effectiveness and superiority of the GTBAD model in PV anomaly detection.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 5","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147440863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proposed dynamic fish counting method in a multivariate environment based on YOLO model and ONNX format 提出了基于YOLO模型和ONNX格式的多元环境下动态鱼群计数方法
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-09 DOI: 10.1007/s10489-026-07153-w
Nguyen Minh Son, Huynh Cao Tuan, Phan Thi Huong, Nguyen Khac Hoang, Niusha Shafiabady, Thanh Q. Nguyen

This study presents a deployment-oriented fish-counting pipeline for multivariate aquaculture environments, where illumination, turbidity, depth, and fish density jointly affect detection reliability. Using a dataset of 6,500 labeled 1080p images collected across Vietnamese aquaculture sites, we benchmark two unmodified Ultralytics detectors (YOLO11 and YOLOv8), quantify the contribution of scenario-informed preprocessing and augmentation, and export the selected model to ONNX for execution on heterogeneous hardware (GPU, CPU, and embedded edge devices). On the test set, YOLO11 achieves (:mA{P}_{50-95}=:0.993) and YOLOv8 achieves 0.995; under the operational constraint of (:mA{P}_{50-95}ge::0.99), YOLO11 is selected due to consistently lower latency. ONNX export accelerates YOLO11 inference by ~ 25% while preserving accuracy within (:varDelta:mA{P}_{50-95}le::0.001), and a Wilcoxon signed-rank test confirms that the latency advantage is statistically significant (p < 0.05). Field deployment in three fish ponds further demonstrates reliable counting and proof-of-concept alerting via short-horizon aggregation of frame-wise counts. Importantly, the contribution of this work is system-level: an environment-stratified evaluation protocol, a data-centric robustness analysis, and an ONNX-based “train once, deploy across devices” monitoring pipeline—rather than a new detection architecture.

本研究针对光照、浊度、深度和鱼类密度共同影响检测可靠性的多元水产养殖环境,提出了一种面向部署的鱼类计数管道。使用从越南水产养殖场收集的6,500个标记的1080p图像数据集,我们对两个未经修改的Ultralytics检测器(YOLO11和YOLOv8)进行基准测试,量化场景信息预处理和增强的贡献,并将所选模型导出到ONNX,以便在异构硬件(GPU, CPU和嵌入式边缘设备)上执行。在测试集上,YOLO11达到(:mA{P}_{50-95}=:0.993), YOLOv8达到0.995;在(:mA{P}_{50-95}ge::0.99)的操作约束下,由于延迟始终较低,选择了YOLO11。ONNX出口加速YOLO11推理25% while preserving accuracy within (:varDelta:mA{P}_{50-95}le::0.001), and a Wilcoxon signed-rank test confirms that the latency advantage is statistically significant (p < 0.05). Field deployment in three fish ponds further demonstrates reliable counting and proof-of-concept alerting via short-horizon aggregation of frame-wise counts. Importantly, the contribution of this work is system-level: an environment-stratified evaluation protocol, a data-centric robustness analysis, and an ONNX-based “train once, deploy across devices” monitoring pipeline—rather than a new detection architecture.
{"title":"Proposed dynamic fish counting method in a multivariate environment based on YOLO model and ONNX format","authors":"Nguyen Minh Son,&nbsp;Huynh Cao Tuan,&nbsp;Phan Thi Huong,&nbsp;Nguyen Khac Hoang,&nbsp;Niusha Shafiabady,&nbsp;Thanh Q. Nguyen","doi":"10.1007/s10489-026-07153-w","DOIUrl":"10.1007/s10489-026-07153-w","url":null,"abstract":"<div><p>This study presents a deployment-oriented fish-counting pipeline for multivariate aquaculture environments, where illumination, turbidity, depth, and fish density jointly affect detection reliability. Using a dataset of 6,500 labeled 1080p images collected across Vietnamese aquaculture sites, we benchmark two unmodified Ultralytics detectors (YOLO11 and YOLOv8), quantify the contribution of scenario-informed preprocessing and augmentation, and export the selected model to ONNX for execution on heterogeneous hardware (GPU, CPU, and embedded edge devices). On the test set, YOLO11 achieves <span>(:mA{P}_{50-95}=:0.993)</span> and YOLOv8 achieves 0.995; under the operational constraint of <span>(:mA{P}_{50-95}ge::0.99)</span>, YOLO11 is selected due to consistently lower latency. ONNX export accelerates YOLO11 inference by ~ 25% while preserving accuracy within <span>(:varDelta:mA{P}_{50-95}le::0.001)</span>, and a Wilcoxon signed-rank test confirms that the latency advantage is statistically significant (<i>p</i> &lt; 0.05). Field deployment in three fish ponds further demonstrates reliable counting and proof-of-concept alerting via short-horizon aggregation of frame-wise counts. Importantly, the contribution of this work is system-level: an environment-stratified evaluation protocol, a data-centric robustness analysis, and an ONNX-based “train once, deploy across devices” monitoring pipeline—rather than a new detection architecture.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 5","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147440931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging systems and control theory for social robotics: a model-based behavioral control approach to human-robot interaction 利用社会机器人的系统和控制理论:基于模型的人机交互行为控制方法
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-09 DOI: 10.1007/s10489-026-07100-9
Maria L. Morão Patrício, Anahita Jamshidnejad

Social robots (SRs) are increasingly expected to assist in healthcare, education, and companionship, thereby addressing the growing need for personalized and affordable health and social care. However, sustaining long-term user engagement remains a major challenge for SRs, largely due to their limited understanding of human mental states. Accordingly, we leverage a recently introduced mathematical dynamic model of human perception, cognition, and decision-making for behavioral control of SRs. By identifying the parameters of this model and deploying it within a model-based behavioral steering system, SRs can autonomously adapt their actions to evolving user mental states, enhancing long-term engagement and personalization. To achieve this, we introduce the first integration of a systems-theoretic cognitive model into a closed-loop predictive behavioral control framework for SRs, formulated as a constrained multi-objective optimization problem that enables transparent, cognition-aware adaptation. In experiments with (varvec{10}) participants interacting with a Nao robot across three chess puzzle sessions ((varvec{45}) to (varvec{90}) minutes each), the identified model achieved a mean squared error (MSE) of (varvec{0.067}) (i.e., (varvec{1.675%}) of the maximum possible MSE) in tracking beliefs, goals, and emotions of participants, and increased engagement by (varvec{16%}) ((varvec{p = 0.009})) compared to a model-free baseline. Post-interaction participant questionnaires further confirmed the perceived engagement and awareness of the model-based controller. Overall, the framework provides a practical pathway toward SRs that autonomously adapt to users in real time, sustain long-term engagement, and ultimately deliver more effective and personalized assistance in domains such as healthcare, education, and companionship.

人们越来越期望社交机器人(SRs)在医疗保健、教育和陪伴方面提供帮助,从而满足日益增长的个性化和负担得起的健康和社会护理需求。然而,维持长期用户粘性仍然是sr面临的主要挑战,这主要是因为他们对人类心理状态的理解有限。因此,我们利用最近引入的人类感知、认知和决策的数学动态模型来进行sr的行为控制。通过识别该模型的参数并将其部署到基于模型的行为指导系统中,用户服务提供商可以自主地调整其行为以适应不断变化的用户心理状态,从而增强长期粘性和个性化。为了实现这一目标,我们首次将系统理论认知模型集成到SRs的闭环预测行为控制框架中,该框架被表述为一个约束多目标优化问题,能够实现透明的、认知感知的适应。在实验中,(varvec{10})参与者与Nao机器人在三个国际象棋谜题会话(每次(varvec{45})到(varvec{90})分钟)中进行交互,所识别的模型在跟踪参与者的信念、目标和情绪方面实现了均方误差(MSE) (varvec{0.067})(即(varvec{1.675%})的最大可能MSE),并且与无模型基线相比,增加了(varvec{16%}) ((varvec{p = 0.009}))的参与度。互动后的参与者问卷进一步证实了基于模型的控制器的感知参与和意识。总体而言,该框架为实现实时自主适应用户、维持长期参与并最终在医疗保健、教育和陪伴等领域提供更有效和个性化帮助的SRs提供了一条实用途径。
{"title":"Leveraging systems and control theory for social robotics: a model-based behavioral control approach to human-robot interaction","authors":"Maria L. Morão Patrício,&nbsp;Anahita Jamshidnejad","doi":"10.1007/s10489-026-07100-9","DOIUrl":"10.1007/s10489-026-07100-9","url":null,"abstract":"<div>\u0000 \u0000 <p>Social robots (SRs) are increasingly expected to assist in healthcare, education, and companionship, thereby addressing the growing need for personalized and affordable health and social care. However, sustaining long-term user engagement remains a major challenge for SRs, largely due to their limited understanding of human mental states. Accordingly, we leverage a recently introduced mathematical dynamic model of human perception, cognition, and decision-making for behavioral control of SRs. By identifying the parameters of this model and deploying it within a model-based behavioral steering system, SRs can autonomously adapt their actions to evolving user mental states, enhancing long-term engagement and personalization. To achieve this, we introduce the first integration of a systems-theoretic cognitive model into a closed-loop predictive behavioral control framework for SRs, formulated as a constrained multi-objective optimization problem that enables transparent, cognition-aware adaptation. In experiments with <span>(varvec{10})</span> participants interacting with a Nao robot across three chess puzzle sessions (<span>(varvec{45})</span> to <span>(varvec{90})</span> minutes each), the identified model achieved a mean squared error (MSE) of <span>(varvec{0.067})</span> (i.e., <span>(varvec{1.675%})</span> of the maximum possible MSE) in tracking beliefs, goals, and emotions of participants, and increased engagement by <span>(varvec{16%})</span> (<span>(varvec{p = 0.009})</span>) compared to a model-free baseline. Post-interaction participant questionnaires further confirmed the perceived engagement and awareness of the model-based controller. Overall, the framework provides a practical pathway toward SRs that autonomously adapt to users in real time, sustain long-term engagement, and ultimately deliver more effective and personalized assistance in domains such as healthcare, education, and companionship.</p>\u0000 </div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 5","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-026-07100-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147440862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting distribution-based confidence integration in graph neural network recommenders 图神经网络推荐中基于分布的置信度集成
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-09 DOI: 10.1007/s10489-026-07187-0
Joel Machado Pires, Eduardo Ferreira da Silva, Frederico Araújo Durão

Recommender systems assist users in navigating information-rich environments by delivering personalized content. While model-based collaborative filtering approaches, such as matrix factorization (MF) and graph neural networks (GNN), are widely adopted, the inherent uncertainty in user preferences and the sparsity of data can lead to unreliable predictions. Confidence estimation has emerged as a strategy to quantify prediction reliability, yet its integration remains unexplored in GNN-based models, and prior methods often degrade accuracy or suffer from convergence issues. This study benchmarks four prominent confidence-aware models—OrdRec, Confidence-aware Probabilistic Matrix Factorization, Confidence-aware Bayesian Probabilistic Matrix Factorization, and Lightweight Beta Distribution across three public datasets: Amazon Movies and TVs, Jester Joke, and Movie Lens. We evaluate these models in terms of rating accuracy, ranking quality, and confidence estimate quality. In addition, we propose a novel confidence-integrated model based on a deep graph attention network architecture. Experimental results reveal that while distribution-based confidence methods are highly sensitive to dataset characteristics and harm accuracy, the proposed method demonstrates consistent performance across all datasets and metrics, outperforming prior distribution-based models. Nevertheless, challenges remain in aligning confidence estimates with prediction error.

推荐系统通过提供个性化的内容,帮助用户在信息丰富的环境中导航。虽然基于模型的协同过滤方法,如矩阵分解(MF)和图神经网络(GNN)被广泛采用,但用户偏好的固有不确定性和数据的稀疏性可能导致不可靠的预测。置信度估计作为一种量化预测可靠性的策略已经出现,但在基于gnn的模型中,置信度估计的整合尚未得到探索,而且之前的方法往往会降低准确性或存在收敛问题。本研究在三个公共数据集(Amazon Movies and TVs、Jester Joke和Movie Lens)上对四种重要的信心感知模型——ordrec、信心感知概率矩阵分解、信心感知贝叶斯概率矩阵分解和轻量级Beta分布进行了基准测试。我们根据评级准确性、排名质量和置信度评估质量来评估这些模型。此外,我们提出了一种新的基于深度图注意力网络架构的置信度集成模型。实验结果表明,尽管基于分布的置信度方法对数据集特征和伤害准确性高度敏感,但该方法在所有数据集和指标上表现出一致的性能,优于先前基于分布的模型。然而,在使信心估计与预测误差保持一致方面仍然存在挑战。
{"title":"Exploiting distribution-based confidence integration in graph neural network recommenders","authors":"Joel Machado Pires,&nbsp;Eduardo Ferreira da Silva,&nbsp;Frederico Araújo Durão","doi":"10.1007/s10489-026-07187-0","DOIUrl":"10.1007/s10489-026-07187-0","url":null,"abstract":"<div>\u0000 \u0000 <p>Recommender systems assist users in navigating information-rich environments by delivering personalized content. While model-based collaborative filtering approaches, such as matrix factorization (MF) and graph neural networks (GNN), are widely adopted, the inherent uncertainty in user preferences and the sparsity of data can lead to unreliable predictions. Confidence estimation has emerged as a strategy to quantify prediction reliability, yet its integration remains unexplored in GNN-based models, and prior methods often degrade accuracy or suffer from convergence issues. This study benchmarks four prominent confidence-aware models—OrdRec, Confidence-aware Probabilistic Matrix Factorization, Confidence-aware Bayesian Probabilistic Matrix Factorization, and Lightweight Beta Distribution across three public datasets: Amazon Movies and TVs, Jester Joke, and Movie Lens. We evaluate these models in terms of rating accuracy, ranking quality, and confidence estimate quality. In addition, we propose a novel confidence-integrated model based on a deep graph attention network architecture. Experimental results reveal that while distribution-based confidence methods are highly sensitive to dataset characteristics and harm accuracy, the proposed method demonstrates consistent performance across all datasets and metrics, outperforming prior distribution-based models. Nevertheless, challenges remain in aligning confidence estimates with prediction error.</p>\u0000 </div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 5","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-026-07187-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147440933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MIT-CA: Multi-modal interaction transformer with cross-attention for malware classification MIT-CA:用于恶意软件分类的多模态交互转换器
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-09 DOI: 10.1007/s10489-026-07178-1
Meng Zhao, Peng Yin, Hu Wang, Fangyuan Hou, Xuesong Wang, Yurong Song, Chunyu Yao

The malware classification task involves systematically categorizing malware based on its distinctive characteristics, behavior patterns, and functional attributes. Existing classification methods typically rely on machine learning techniques based on single features or unimodal image classification techniques, which are insufficient to address the current complex and diverse malware threats, making it difficult to fully capture and integrate the multidimensional characteristics exhibited by malware at different levels and their potential interrelationships. To this end, we propose a malware classification method based on multi-modal feature interaction, namely the Multi-modal Interaction Transformer with Cross-Attention (MIT-CA). This model leverages both BYTES files, which represent malware in hexadecimal format, and ASM files obtained from disassembly, converting them into comprehensible textual languages and grayscale images, respectively. By employing multi-modal learning, it aggregates information from multiple data sources, enabling the model to learn more comprehensive representations. During the multi-modal interaction, a transformer encoding structure based on Cross-Attention is used to facilitate information exchange between different modal features. This trick guides the model to learn malware features more comprehensively and allows for independent training of each modal feature. Extensive experiments conducted on malware classification datasets verify the effectiveness of the proposed model, and experimental results demonstrate the outstanding performance of our model in the malware classification task.

恶意软件分类任务包括根据其独特的特征、行为模式和功能属性对恶意软件进行系统的分类。现有的分类方法通常依赖于基于单一特征的机器学习技术或单模态图像分类技术,这些方法不足以解决当前复杂多样的恶意软件威胁,难以完全捕获和整合不同级别恶意软件所表现出的多维特征及其潜在的相互关系。为此,我们提出了一种基于多模态特征交互的恶意软件分类方法,即multi-modal interaction Transformer with Cross-Attention (MIT-CA)。该模型利用了字节文件(以十六进制格式表示恶意软件)和ASM文件(从反汇编中获得),分别将它们转换为可理解的文本语言和灰度图像。通过采用多模态学习,聚合来自多个数据源的信息,使模型能够学习更全面的表示。在多模态交互过程中,采用了一种基于交叉注意的转换编码结构,方便了不同模态特征之间的信息交换。这个技巧引导模型更全面地学习恶意软件的特征,并允许对每个模态特征进行独立训练。在恶意软件分类数据集上进行的大量实验验证了该模型的有效性,实验结果证明了该模型在恶意软件分类任务中的出色性能。
{"title":"MIT-CA: Multi-modal interaction transformer with cross-attention for malware classification","authors":"Meng Zhao,&nbsp;Peng Yin,&nbsp;Hu Wang,&nbsp;Fangyuan Hou,&nbsp;Xuesong Wang,&nbsp;Yurong Song,&nbsp;Chunyu Yao","doi":"10.1007/s10489-026-07178-1","DOIUrl":"10.1007/s10489-026-07178-1","url":null,"abstract":"<div>\u0000 \u0000 <p>The malware classification task involves systematically categorizing malware based on its distinctive characteristics, behavior patterns, and functional attributes. Existing classification methods typically rely on machine learning techniques based on single features or unimodal image classification techniques, which are insufficient to address the current complex and diverse malware threats, making it difficult to fully capture and integrate the multidimensional characteristics exhibited by malware at different levels and their potential interrelationships. To this end, we propose a malware classification method based on multi-modal feature interaction, namely the Multi-modal Interaction Transformer with Cross-Attention (MIT-CA). This model leverages both BYTES files, which represent malware in hexadecimal format, and ASM files obtained from disassembly, converting them into comprehensible textual languages and grayscale images, respectively. By employing multi-modal learning, it aggregates information from multiple data sources, enabling the model to learn more comprehensive representations. During the multi-modal interaction, a transformer encoding structure based on Cross-Attention is used to facilitate information exchange between different modal features. This trick guides the model to learn malware features more comprehensively and allows for independent training of each modal feature. Extensive experiments conducted on malware classification datasets verify the effectiveness of the proposed model, and experimental results demonstrate the outstanding performance of our model in the malware classification task.</p>\u0000 </div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 5","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147440934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1