首页 > 最新文献

Applied Intelligence最新文献

英文 中文
A hybrid CNN-transformer network with difference enhancement and frequency fusion for remote sensing image change detection 基于差分增强和频率融合的cnn -变压器混合网络遥感图像变化检测
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-18 DOI: 10.1007/s10489-025-07034-8
Meiru Wang, Sheng Fang, Yunfan Li, Xingli Zhang, Zhe Li

With the rapid advancement of deep learning, remote sensing image change detection (CD) has made significant progress. The complementary strengths of convolutional neural networks (CNNs) and Transformers have attracted considerable attention. This has prompted researchers to explore CNN-Transformer parallel architectures for CD tasks. However, existing methods often rely on spatial-domain operations, such as convolution and pooling, to integrate local and global features, which can result in the loss of fine-grained details, leading to incomplete detection of small changes and blurred boundaries in change regions. Additionally, many methods employ a single strategy for difference extraction, limiting their ability to model temporal dependencies and making them susceptible to irrelevant environmental variations, which can cause pseudo-changes. To address these challenges, we propose a Hybrid CNN-Transformer Network with Difference Enhancement and Frequency Fusion (HCTFNet). HCTFNet introduces a Temporal Feature Fusion Module (TFFM) that efficiently extracts difference features via a dual-branch operation, while integrating an attention mechanism to highlight actual changes and suppress irrelevant noise. Furthermore, the Hybrid CNN-Transformer Fusion (HCTF) module extracts both local and global features and applies frequency-domain processing to enhance the interaction between local and global features, thereby preserving fine-grained spatial details more effectively. Extensive experiments conducted on three publicly available benchmark datasets demonstrate that HCTFNet achieves superior CD performance compared to existing mainstream methods.

随着深度学习的快速发展,遥感图像变化检测(CD)取得了重大进展。卷积神经网络(cnn)和变形金刚(Transformers)的互补优势引起了人们的广泛关注。这促使研究人员探索用于CD任务的CNN-Transformer并行架构。然而,现有的方法往往依赖于卷积和池化等空域操作来整合局部和全局特征,这可能会导致细粒度细节的丢失,导致小变化的检测不完整,变化区域的边界模糊。此外,许多方法采用单一策略进行差异提取,限制了它们建模时间依赖性的能力,并使它们容易受到不相关的环境变化的影响,这可能导致伪变化。为了解决这些挑战,我们提出了一种具有差分增强和频率融合的混合cnn -变压器网络(HCTFNet)。HCTFNet引入了一种时间特征融合模块(TFFM),该模块通过双分支操作有效地提取差异特征,同时集成了注意机制来突出实际变化并抑制无关噪声。此外,混合cnn -变压器融合(HCTF)模块提取局部和全局特征,并应用频域处理增强局部和全局特征之间的相互作用,从而更有效地保留细粒度的空间细节。在三个公开可用的基准数据集上进行的大量实验表明,与现有的主流方法相比,HCTFNet具有优越的CD性能。
{"title":"A hybrid CNN-transformer network with difference enhancement and frequency fusion for remote sensing image change detection","authors":"Meiru Wang,&nbsp;Sheng Fang,&nbsp;Yunfan Li,&nbsp;Xingli Zhang,&nbsp;Zhe Li","doi":"10.1007/s10489-025-07034-8","DOIUrl":"10.1007/s10489-025-07034-8","url":null,"abstract":"<div><p>With the rapid advancement of deep learning, remote sensing image change detection (CD) has made significant progress. The complementary strengths of convolutional neural networks (CNNs) and Transformers have attracted considerable attention. This has prompted researchers to explore CNN-Transformer parallel architectures for CD tasks. However, existing methods often rely on spatial-domain operations, such as convolution and pooling, to integrate local and global features, which can result in the loss of fine-grained details, leading to incomplete detection of small changes and blurred boundaries in change regions. Additionally, many methods employ a single strategy for difference extraction, limiting their ability to model temporal dependencies and making them susceptible to irrelevant environmental variations, which can cause pseudo-changes. To address these challenges, we propose a Hybrid CNN-Transformer Network with Difference Enhancement and Frequency Fusion (HCTFNet). HCTFNet introduces a Temporal Feature Fusion Module (TFFM) that efficiently extracts difference features via a dual-branch operation, while integrating an attention mechanism to highlight actual changes and suppress irrelevant noise. Furthermore, the Hybrid CNN-Transformer Fusion (HCTF) module extracts both local and global features and applies frequency-domain processing to enhance the interaction between local and global features, thereby preserving fine-grained spatial details more effectively. Extensive experiments conducted on three publicly available benchmark datasets demonstrate that HCTFNet achieves superior CD performance compared to existing mainstream methods.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A review of predictive healthcare treatment based on text mining 基于文本挖掘的预测性医疗治疗综述
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-18 DOI: 10.1007/s10489-025-06849-9
Mukun Cao, Bing Li

In this paper, we investigate optimization strategies for predicting high-mortality diseases in response to the challenges of unstable data quality and missing labels in healthcare big data. Firstly, a systematic review of the current state of research and application of machine learning algorithms for predicting heart disease, cardiovascular and cerebrovascular disease, lung cancer, diabetes, and breast cancer is presented. The core contributions are: 1) Semi-supervised self-training experiments were conducted on 14 datasets with labeling ratios of 30(%)-95(%), and the performance of classifiers such as SVM, KNN, LR, AB, DT, and RF was evaluated, and it was found that data balance is crucial for the accuracy improvement of most of the classifiers, and that the sensitivities to labeling ratios varied significantly between different classifiers, e.g., SVM robustness and KNN dependency. 2) In-depth analysis of feature importance identifies the key attributes of each dataset, whose absence significantly degrades the model performance, while RF exhibits optimal robustness due to integration properties. The study provides methodological references and empirical evidence for precision medicine prediction under limited labeling conditions.

本文针对医疗保健大数据中数据质量不稳定和标签缺失的挑战,探讨了高死亡率疾病预测的优化策略。首先,系统综述了机器学习算法在心脏病、心脑血管疾病、肺癌、糖尿病和乳腺癌预测方面的研究和应用现状。核心贡献有:1)在标记率为30 (%) -95 (%)的14个数据集上进行了半监督自训练实验,并对SVM、KNN、LR、AB、DT和RF等分类器的性能进行了评价,发现数据平衡对大多数分类器的准确率提高至关重要,并且不同分类器对标记率的敏感性存在显著差异,如SVM的鲁棒性和KNN依赖性。2)对特征重要性的深入分析识别了每个数据集的关键属性,这些属性的缺失会显著降低模型的性能,而RF由于其集成特性而具有最佳的鲁棒性。本研究为有限标签条件下的精准医学预测提供了方法参考和经验依据。
{"title":"A review of predictive healthcare treatment based on text mining","authors":"Mukun Cao,&nbsp;Bing Li","doi":"10.1007/s10489-025-06849-9","DOIUrl":"10.1007/s10489-025-06849-9","url":null,"abstract":"<div><p>In this paper, we investigate optimization strategies for predicting high-mortality diseases in response to the challenges of unstable data quality and missing labels in healthcare big data. Firstly, a systematic review of the current state of research and application of machine learning algorithms for predicting heart disease, cardiovascular and cerebrovascular disease, lung cancer, diabetes, and breast cancer is presented. The core contributions are: 1) Semi-supervised self-training experiments were conducted on 14 datasets with labeling ratios of 30<span>(%)</span>-95<span>(%)</span>, and the performance of classifiers such as SVM, KNN, LR, AB, DT, and RF was evaluated, and it was found that data balance is crucial for the accuracy improvement of most of the classifiers, and that the sensitivities to labeling ratios varied significantly between different classifiers, e.g., SVM robustness and KNN dependency. 2) In-depth analysis of feature importance identifies the key attributes of each dataset, whose absence significantly degrades the model performance, while RF exhibits optimal robustness due to integration properties. The study provides methodological references and empirical evidence for precision medicine prediction under limited labeling conditions.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UMLGA: unsupervised graph meta-learning via local subgraph augmentation UMLGA:基于局部子图增强的无监督图元学习
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-18 DOI: 10.1007/s10489-025-06274-y
Ningbo Huang, Gang Zhou, Meng Zhang, Yi Xia, Shunhang Li

Graph meta-learning models can fast adapt to new tasks with extremely limited labeled data by learning transferable meta knowledge and inductive bias on graph. Existing methods construct meta-training tasks with abundant labeled nodes from base classes, which limit the application scenarios of graph meta-learning. Therefore, we propose an unsupervised graph meta-learning framework via local subgraph augmentation (UMLGA). Specifically, we firstly propose a graph clustering-based sampling method to sample anchor nodes from different natural classes and extract corresponding local subgraphs. Then, supposing that the generated augmentation samples share the same labels, we design structure-wise and feature-wise graph augmentation strategies to generate diverse augmentation subgraphs while keeping the semantics unchanged. Finally, we perform meta-training on the unsupervised constructed tasks with weighted meta-loss, which can extract cross-tasks knowledge for fast adaption to novel classes. To evaluate the effectiveness of UMLGA, series of experiments are conducted on four real-world graph datasets. Experiment results show that, even without relying on extensive labeled data, UMLGA can achieve comparable and even better few-shot node classification performance comparing with the supervised graph meta-learning backbone models. With GPN as the backbone model, the improvements of UMLGA are respectively 3.0(sim )9.3%, 4.4(sim )11.6%, -1.2(sim )9.3%, and 1.8(sim )15.1% on Amazon-Clothing, Amazon-Electronics, DBLP, and ogbn-products datasets.

图元学习模型通过学习可转移元知识和图上的归纳偏差,可以快速适应标注数据极其有限的新任务。现有方法从基类中构造具有大量标记节点的元训练任务,限制了图元学习的应用场景。因此,我们提出了一种基于局部子图增强(UMLGA)的无监督图元学习框架。具体而言,我们首先提出了一种基于图聚类的采样方法,从不同的自然类别中采样锚节点并提取相应的局部子图。然后,假设生成的增强样本具有相同的标签,我们设计了结构型和特征型图增强策略,以在保持语义不变的情况下生成不同的增强子图。最后,利用加权元损失对无监督构造任务进行元训练,提取跨任务知识,快速适应新类。为了评估UMLGA的有效性,在4个真实的图数据集上进行了一系列的实验。实验结果表明,即使不依赖于大量的标记数据,与监督图元学习骨干模型相比,UMLGA也能获得相当甚至更好的少射节点分类性能。以GPN为骨干模型时,UMLGA的改进率分别为3.0 (sim ) 9.3%, 4.4(sim )11.6%, -1.2(sim )9.3%, and 1.8(sim )15.1% on Amazon-Clothing, Amazon-Electronics, DBLP, and ogbn-products datasets.
{"title":"UMLGA: unsupervised graph meta-learning via local subgraph augmentation","authors":"Ningbo Huang,&nbsp;Gang Zhou,&nbsp;Meng Zhang,&nbsp;Yi Xia,&nbsp;Shunhang Li","doi":"10.1007/s10489-025-06274-y","DOIUrl":"10.1007/s10489-025-06274-y","url":null,"abstract":"<div><p>Graph meta-learning models can fast adapt to new tasks with extremely limited labeled data by learning transferable meta knowledge and inductive bias on graph. Existing methods construct meta-training tasks with abundant labeled nodes from base classes, which limit the application scenarios of graph meta-learning. Therefore, we propose an unsupervised graph meta-learning framework via local subgraph augmentation (<b>UMLGA</b>). Specifically, we firstly propose a graph clustering-based sampling method to sample anchor nodes from different natural classes and extract corresponding local subgraphs. Then, supposing that the generated augmentation samples share the same labels, we design structure-wise and feature-wise graph augmentation strategies to generate diverse augmentation subgraphs while keeping the semantics unchanged. Finally, we perform meta-training on the unsupervised constructed tasks with weighted meta-loss, which can extract cross-tasks knowledge for fast adaption to novel classes. To evaluate the effectiveness of <b>UMLGA</b>, series of experiments are conducted on four real-world graph datasets. Experiment results show that, even without relying on extensive labeled data, <b>UMLGA</b> can achieve comparable and even better few-shot node classification performance comparing with the supervised graph meta-learning backbone models. With GPN as the backbone model, the improvements of <b>UMLGA</b> are respectively 3.0<span>(sim )</span>9.3%, 4.4<span>(sim )</span>11.6%, -1.2<span>(sim )</span>9.3%, and 1.8<span>(sim )</span>15.1% on Amazon-Clothing, Amazon-Electronics, DBLP, and ogbn-products datasets.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal biometric recognition with cancelable template protection using deep learning and an optimized bloom filter 使用深度学习和优化的布隆过滤器的可取消模板保护的多模态生物识别
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-17 DOI: 10.1007/s10489-025-07029-5
Md Sabuj Khan, Ting Zhong, Hengjian Li, Fan Zhou

Multimodal biometric systems provide advantages over unimodal systems, such as improved accuracy, spoofing resistance, and broader population coverage. However, challenges related to privacy and template protection remain. To address these issues, we propose a integrated framework for multimodal biometric recognition with cancelable template protection, a privacy-preserving mechanism, using deep learning and an optimized bloom filter to significantly enhance recognition performance while ensuring robust security and privacy. First, a key mapping and management system is designed to generate secure keys that support the entire framework. Second, the Dynamic Attention and Hash Network (DAHNet) is employed to extract discriminative palmprint features through a hybrid attention mechanism and a deep hashing network. Third, a quantized fingerprint feature mapping technique is used to generate the corresponding binary fingerprint vector. Finally, the system applies an XOR operation to fuse DAHNet-extracted palmprint features and quantized fingerprint features, followed by an optimized bloom filter to generate secure cancelable templates, ensuring cancelability, irreversibility, and protection against template reconstruction. Experimental evaluations on the TJU and PolyU palmprint datasets, as well as the FVC2002 fingerprint dataset, demonstrate the outstanding accuracy of our state-of-the-art approach, achieving a remarkably low Equal Error Rate (EER). Comparative analysis further shows that the proposed multimodal system significantly outperforms unimodal systems in both recognition accuracy and security. Moreover, security analysis confirms that the framework satisfies all critical requirements for cancelable biometric template protection, including irreversibility, unlinkability, revocability, and robust privacy against various attack scenarios.

与单模态系统相比,多模态生物识别系统具有更高的准确性、抗欺骗能力和更广泛的人口覆盖范围。然而,与隐私和模板保护相关的挑战仍然存在。为了解决这些问题,我们提出了一个集成的多模态生物识别框架,该框架具有可取消的模板保护,隐私保护机制,使用深度学习和优化的布隆过滤器来显着提高识别性能,同时确保鲁棒的安全性和隐私性。首先,设计了一个密钥映射和管理系统来生成支持整个框架的安全密钥。其次,采用动态注意哈希网络(DAHNet),通过混合注意机制和深度哈希网络提取鉴别掌纹特征。第三,采用量化指纹特征映射技术生成相应的二值指纹向量。最后,系统应用异或操作融合dahnet提取的手印特征和量化指纹特征,然后使用优化的布隆滤波器生成安全的可取消模板,确保可取消性、不可逆性和模板重建保护。对TJU和理大掌纹数据集以及FVC2002指纹数据集的实验评估表明,我们最先进的方法非常准确,达到了非常低的平均错误率(EER)。对比分析进一步表明,多模态系统在识别精度和安全性上都明显优于单模态系统。此外,安全分析证实该框架满足可取消生物识别模板保护的所有关键需求,包括不可逆性、不可链接性、可撤销性和针对各种攻击场景的健壮隐私性。
{"title":"Multimodal biometric recognition with cancelable template protection using deep learning and an optimized bloom filter","authors":"Md Sabuj Khan,&nbsp;Ting Zhong,&nbsp;Hengjian Li,&nbsp;Fan Zhou","doi":"10.1007/s10489-025-07029-5","DOIUrl":"10.1007/s10489-025-07029-5","url":null,"abstract":"<div><p>Multimodal biometric systems provide advantages over unimodal systems, such as improved accuracy, spoofing resistance, and broader population coverage. However, challenges related to privacy and template protection remain. To address these issues, we propose a integrated framework for multimodal biometric recognition with cancelable template protection, a privacy-preserving mechanism, using deep learning and an optimized bloom filter to significantly enhance recognition performance while ensuring robust security and privacy. First, a key mapping and management system is designed to generate secure keys that support the entire framework. Second, the Dynamic Attention and Hash Network (DAHNet) is employed to extract discriminative palmprint features through a hybrid attention mechanism and a deep hashing network. Third, a quantized fingerprint feature mapping technique is used to generate the corresponding binary fingerprint vector. Finally, the system applies an XOR operation to fuse DAHNet-extracted palmprint features and quantized fingerprint features, followed by an optimized bloom filter to generate secure cancelable templates, ensuring cancelability, irreversibility, and protection against template reconstruction. Experimental evaluations on the TJU and PolyU palmprint datasets, as well as the FVC2002 fingerprint dataset, demonstrate the outstanding accuracy of our state-of-the-art approach, achieving a remarkably low Equal Error Rate (EER). Comparative analysis further shows that the proposed multimodal system significantly outperforms unimodal systems in both recognition accuracy and security. Moreover, security analysis confirms that the framework satisfies all critical requirements for cancelable biometric template protection, including irreversibility, unlinkability, revocability, and robust privacy against various attack scenarios.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dual-pathway transformer utilizing dynamic weighting strategy for time series forecasting 利用动态加权策略进行时间序列预测的双通道变压器
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-17 DOI: 10.1007/s10489-025-07033-9
Wenjie Liu, Sen Wang

Time series forecasting has broad applications in fields such as finance, energy management, and weather prediction. PatchTST, a widely recognized model, has significantly improved the input length and accuracy of time series predictions. However, it uniformly treats all patches without considering their varying impacts on future predictions and relies heavily on global attention, often neglecting local information. To address these issues, we have developed an enhanced model: Dynamic Feature Weighting PatchTST (DynamicPatchTST). This model features three key enhancements: (i) a dynamic feature weighting strategy that assigns weights to each patch based on the characteristics of individual time series, emphasizing more impactful patches; (ii) a linear embedding method to replace positional encoding, preventing the over-adjustment or redundant modification of patch weights; (iii) a dual-pathway strategy that integrates local and global information, with the local pathway refined through gated adaptive adjustments to enhance the model’s ability to capture local details. Extensive experiments across multiple standard time series datasets (e.g., ETT, Exchanges, Weather) have demonstrated that our DynamicPatchTST significantly outperforms existing state-of-the-art models, with an average improvement of 4.4% in Mean Squared Error (MSE). This work provides a novel perspective on time series forecasting and paves the way for future advancements in the field.

时间序列预测在金融、能源管理、天气预报等领域有着广泛的应用。PatchTST是一个被广泛认可的模型,它显著提高了时间序列预测的输入长度和准确性。然而,它统一地对待所有的斑块,而不考虑它们对未来预测的不同影响,并且严重依赖全球关注,经常忽略局部信息。为了解决这些问题,我们开发了一个增强模型:动态特征加权补丁(DynamicPatchTST)。该模型具有三个关键的增强功能:(i)动态特征加权策略,根据单个时间序列的特征为每个补丁分配权重,强调更有影响力的补丁;(ii)用线性嵌入方法取代位置编码,防止过度调整或冗余修改patch权重;(iii)整合局部和全局信息的双路径策略,通过门控自适应调整对局部路径进行细化,以增强模型捕捉局部细节的能力。跨多个标准时间序列数据集(例如,ETT, Exchanges, Weather)的广泛实验表明,我们的DynamicPatchTST显着优于现有的最先进的模型,均方误差(MSE)平均提高4.4%。这项工作为时间序列预测提供了一个新的视角,为该领域的未来发展铺平了道路。
{"title":"A dual-pathway transformer utilizing dynamic weighting strategy for time series forecasting","authors":"Wenjie Liu,&nbsp;Sen Wang","doi":"10.1007/s10489-025-07033-9","DOIUrl":"10.1007/s10489-025-07033-9","url":null,"abstract":"<div><p>Time series forecasting has broad applications in fields such as finance, energy management, and weather prediction. PatchTST, a widely recognized model, has significantly improved the input length and accuracy of time series predictions. However, it uniformly treats all patches without considering their varying impacts on future predictions and relies heavily on global attention, often neglecting local information. To address these issues, we have developed an enhanced model: Dynamic Feature Weighting PatchTST (DynamicPatchTST). This model features three key enhancements: (i) a dynamic feature weighting strategy that assigns weights to each patch based on the characteristics of individual time series, emphasizing more impactful patches; (ii) a linear embedding method to replace positional encoding, preventing the over-adjustment or redundant modification of patch weights; (iii) a dual-pathway strategy that integrates local and global information, with the local pathway refined through gated adaptive adjustments to enhance the model’s ability to capture local details. Extensive experiments across multiple standard time series datasets (e.g., ETT, Exchanges, Weather) have demonstrated that our DynamicPatchTST significantly outperforms existing state-of-the-art models, with an average improvement of 4.4% in Mean Squared Error (MSE). This work provides a novel perspective on time series forecasting and paves the way for future advancements in the field.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A joint question-answering model based on split-hop information propagation and heterogeneous representation alignment 基于分跳信息传播和异构表示对齐的联合问答模型
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-16 DOI: 10.1007/s10489-025-07039-3
Gui Chen, Xianhui Liu, Wenlong Hou, Qiujun Deng

Due to the complementary strengths of language models and knowledge graphs, question-answering systems that combine both techniques have emerged. However, existing models suffer from issues such as introducing excessive misleading information and an unreasonable distribution of heterogeneous representations, which negatively affect answer accuracy and reasoning efficiency. To address these problems, we propose a question-answering model based on split-hop information propagation and heterogeneous representation alignment. First, we train a graph attention network using the split-hop information propagation method along multi-hop reasoning paths, enhancing the relevance between extracted sub-graphs and the question context while allowing the network to learn differences across reasoning hops. Next, we align the heterogeneous representations of text and knowledge graph through post-training, ensuring the pre-trained representations fall into a reasonable parameter distribution. We validate the effectiveness and generalizability of our model on the CommonsenseQA and OpenBookQA datasets in the commonsense domain, as well as the MedQA-UMSLE dataset in the biomedical domain, achieving accuracy improvements of 0.5%, 0.6%, and 1.2% over baseline models of the same type, respectively.

由于语言模型和知识图的互补优势,结合这两种技术的问答系统已经出现。然而,现有模型存在引入过多误导性信息、异构表征分布不合理等问题,影响了答案的准确性和推理效率。为了解决这些问题,我们提出了一种基于分跳信息传播和异构表示对齐的问答模型。首先,我们使用多跳推理路径上的分跳信息传播方法训练图注意网络,增强提取的子图与问题上下文之间的相关性,同时允许网络学习不同推理跳间的差异。接下来,我们通过后训练对文本和知识图的异构表示进行对齐,确保预训练的表示处于合理的参数分布。我们在常识领域的CommonsenseQA和OpenBookQA数据集以及生物医学领域的MedQA-UMSLE数据集上验证了我们的模型的有效性和泛化性,分别比相同类型的基线模型提高了0.5%,0.6%和1.2%。
{"title":"A joint question-answering model based on split-hop information propagation and heterogeneous representation alignment","authors":"Gui Chen,&nbsp;Xianhui Liu,&nbsp;Wenlong Hou,&nbsp;Qiujun Deng","doi":"10.1007/s10489-025-07039-3","DOIUrl":"10.1007/s10489-025-07039-3","url":null,"abstract":"<div><p>Due to the complementary strengths of language models and knowledge graphs, question-answering systems that combine both techniques have emerged. However, existing models suffer from issues such as introducing excessive misleading information and an unreasonable distribution of heterogeneous representations, which negatively affect answer accuracy and reasoning efficiency. To address these problems, we propose a question-answering model based on split-hop information propagation and heterogeneous representation alignment. First, we train a graph attention network using the split-hop information propagation method along multi-hop reasoning paths, enhancing the relevance between extracted sub-graphs and the question context while allowing the network to learn differences across reasoning hops. Next, we align the heterogeneous representations of text and knowledge graph through post-training, ensuring the pre-trained representations fall into a reasonable parameter distribution. We validate the effectiveness and generalizability of our model on the CommonsenseQA and OpenBookQA datasets in the commonsense domain, as well as the MedQA-UMSLE dataset in the biomedical domain, achieving accuracy improvements of 0.5%, 0.6%, and 1.2% over baseline models of the same type, respectively.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145754355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PRANet: Pathological relationship perception and dual attention guided network for diabetic retinopathy grading 糖尿病视网膜病变分级的病理关系感知和双注意引导网络
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-16 DOI: 10.1007/s10489-025-07036-6
Yanfei Guo, Hangli Du, Zhenhua Zhang, Yuncui Wang, Dapeng Li, Dengwang Li

Diabetic retinopathy (DR) is one of the most common ocular complications in diabetic patients. Therefore, early DR screening is crucial for preventing disease deterioration and timely diagnosis. However, the challenge of DR grading arose due to the tiny and unevenly distributed lesions and the complex pathological relationships, which are difficult to capture. This paper proposes the pathological relationship feature perception and dual attention guided network (PRANet) for DR grading. Firstly, a pathological relationship feature perception module (PRFM) is introduced to capture the pathological relationships between different types of lesions. A lesion detection network is used to locate the rough regions of the lesions, and K-Means clustering is performed on the lesion features to generate lesion nodes. The co-occurrence relationships between lesion nodes are used to construct an adjacency matrix, which is then input into a graph convolutional network to explore the complex relationships among the lesion nodes. Moreover, the EfficientNetV2-M backbone is employed to obtain global information of DR images and input them into a dual attention module (DAM). The DAM utilizes spatial and channel attention to emphasize the features of suspicious lesion regions. Pathological relationship features, channel features and spatial features are integrated to achieve precise DR grading. Extensive experiments were carried out on the DDR, APTOS2019 and FGADR datasets. Results show that our method exceeds most existing approaches in classification accuracy and robustness, yielding superior classification performance.

糖尿病视网膜病变(DR)是糖尿病患者最常见的眼部并发症之一。因此,早期DR筛查对于预防疾病恶化和及时诊断至关重要。然而,由于病变微小且分布不均匀,且病理关系复杂,难以捕捉,因此DR分级面临挑战。本文提出了病态关系特征感知和双注意引导网络(PRANet)进行DR分级。首先,引入病理关系特征感知模块(PRFM)来捕捉不同类型病变之间的病理关系。使用病变检测网络定位病变的粗糙区域,对病变特征进行K-Means聚类生成病变节点。利用病变节点之间的共现关系构造邻接矩阵,将邻接矩阵输入到图卷积网络中,探索病变节点之间的复杂关系。此外,利用effentnetv2 - m骨干网获取DR图像的全局信息,并将其输入到双注意模块(DAM)中。DAM利用空间和通道注意力来强调可疑病变区域的特征。结合病理关系特征、通道特征和空间特征,实现精确的DR分级。在DDR、APTOS2019和FGADR数据集上进行了大量实验。结果表明,该方法在分类精度和鲁棒性方面均优于现有的大多数方法,具有较好的分类性能。
{"title":"PRANet: Pathological relationship perception and dual attention guided network for diabetic retinopathy grading","authors":"Yanfei Guo,&nbsp;Hangli Du,&nbsp;Zhenhua Zhang,&nbsp;Yuncui Wang,&nbsp;Dapeng Li,&nbsp;Dengwang Li","doi":"10.1007/s10489-025-07036-6","DOIUrl":"10.1007/s10489-025-07036-6","url":null,"abstract":"<div><p>Diabetic retinopathy (DR) is one of the most common ocular complications in diabetic patients. Therefore, early DR screening is crucial for preventing disease deterioration and timely diagnosis. However, the challenge of DR grading arose due to the tiny and unevenly distributed lesions and the complex pathological relationships, which are difficult to capture. This paper proposes the pathological relationship feature perception and dual attention guided network (PRANet) for DR grading. Firstly, a pathological relationship feature perception module (PRFM) is introduced to capture the pathological relationships between different types of lesions. A lesion detection network is used to locate the rough regions of the lesions, and K-Means clustering is performed on the lesion features to generate lesion nodes. The co-occurrence relationships between lesion nodes are used to construct an adjacency matrix, which is then input into a graph convolutional network to explore the complex relationships among the lesion nodes. Moreover, the EfficientNetV2-M backbone is employed to obtain global information of DR images and input them into a dual attention module (DAM). The DAM utilizes spatial and channel attention to emphasize the features of suspicious lesion regions. Pathological relationship features, channel features and spatial features are integrated to achieve precise DR grading. Extensive experiments were carried out on the DDR, APTOS2019 and FGADR datasets. Results show that our method exceeds most existing approaches in classification accuracy and robustness, yielding superior classification performance.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145754357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-supervised learning radiomics nomogram integrating anatomical structures can identify cerebellar hypoplasia in prenatal ultrasound 结合解剖结构的自监督学习放射组学图可以在产前超声中识别小脑发育不全
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-16 DOI: 10.1007/s10489-025-06986-1
Ruifan He, Yiling Ma, Xiaoxiao Wu, Fu Liu, Liuyue Li, Tongquan Wu, Guoping Xu, Chen Cheng, Sheng Zhao, Xinglong Wu
<div><p>Background The early and accurate identification of fetal cerebellar hypoplasia (CH) during the prenatal stage is crucial for timely intervention and decision-making. Medical ultrasound represents the primary tool for CH diagnosis. However, the accuracy of CH diagnosis may be limited by imaging artifacts and subjective judgments among sonographers. Artificial intelligence provides an effective tool for improving the diagnostic accuracy and consistency of ultrasound imaging. Objective This study aims to develop and validate a self-supervised learning radiomics nomogram (SSRN) that integrates the anatomical structures of the fetal skull, cerebellum, and cistern in order to assess prenatal risk for fetal CH, to identify significant factors that may influence CH diagnosis, and to evaluate the diagnostic efficacy of SSRN in clinical applications. Method This retrospective study included clinical data and ultrasound images from 547 normal fetuses and 301 fetuses diagnosed with CH between September 2019 and September 2023 at the Ultrasound Diagnostic Department of Hubei Maternal and Child Health Hospital, China. Subsequently, the standard brain views were selected by experienced sonographers, who also delineated the contours of the skull, cerebellum, and cistern in each view. In the self-supervised learning strategy, VQ-VAE-2 was employed to extract the latent features from these brain views, which were then converted into a self-supervised image score (SIS). Radiomics features were extracted from the combined region of interest (ROI) of the cerebellum and cistern to obtain a radiomics score (RS). The proposed SSRN was constructed by integrating significant demographic and morphological features identified through univariate and multivariate logistic regression analyses, along with SIS and RS. In order to validate the clinical potential of SSRN, several comparative models were established, including an expert model (EM) comprising three sonographers with varying years of clinical experience, a clinical regression model (CLM) based on clinical data, a self-supervised learning classification model (SSL-CM) based on latent features extracted by VQ-VAE-2, and a radiomics model (RM) based on radiomics features. Results The study identified several statistically significant influencing factors, including the width of the cistern, the area of the cistern, cerebellum and skull, the area ratio between the cistern and cerebellum, SIS, and RS. Subsequently, SSRN was constructed using the aforementioned factors, achieving an accuracy of 0.906 and an AUC of 0.956. This resulted in a significantly enhanced performance in comparison to alternative models and EM (accuracy: 0.782, AUC: 0.752). Furthermore, it was observed that the integration of anatomical structures in SSRN and CLM (accuracy: 0.894, AUC: 0.934) performed better than RM (accuracy: 0.871, AUC: 0.934) and SSL-CM (accuracy: 0.622, AUC: 0.641), which lacked such incorporation. A subgroup analysis re
背景产前早期准确识别胎儿小脑发育不全(CH)是及时干预和决策的关键。医学超声是诊断CH的主要工具。然而,超声诊断的准确性可能会受到成像伪影和超声医师主观判断的限制。人工智能为提高超声成像的诊断准确性和一致性提供了有效的工具。目的建立并验证一种整合胎儿颅骨、小脑和脑池解剖结构的自监督学习放射组学图(SSRN),以评估胎儿CH的产前风险,识别可能影响CH诊断的重要因素,评估SSRN在临床应用中的诊断效果。方法回顾性研究2019年9月至2023年9月湖北省妇幼保健院超声诊断科547例正常胎儿和301例诊断为CH的胎儿的临床资料和超声图像。随后,由经验丰富的超声医师选择标准的大脑视图,他们还在每个视图中描绘头骨、小脑和脑池的轮廓。在自监督学习策略中,使用VQ-VAE-2从这些脑视图中提取潜在特征,然后将其转换为自监督图像评分(SIS)。从小脑和脑池的联合兴趣区(ROI)提取放射组学特征,获得放射组学评分(RS)。通过整合通过单变量和多变量逻辑回归分析确定的重要人口统计学和形态学特征,以及SIS和RS,构建了所提出的SSRN。为了验证SSRN的临床潜力,建立了几个比较模型,包括由三位具有不同临床经验的超声医师组成的专家模型(EM),基于临床数据的临床回归模型(CLM),基于VQ-VAE-2提取的潜在特征的自监督学习分类模型(SSL-CM)和基于放射组学特征的放射组学模型(RM)。结果确定了池宽、池面积、小脑和颅骨、池与小脑的面积比、SIS和RS等具有统计学意义的影响因素,利用这些因素构建了SSRN,准确率为0.906,AUC为0.956。与其他模型和EM相比,这显著提高了性能(精度:0.782,AUC: 0.752)。此外,我们观察到SSRN与CLM的解剖结构整合(准确度:0.894,AUC: 0.934)优于RM(准确度:0.871,AUC: 0.934)和SSL-CM(准确度:0.622,AUC: 0.641),后者缺乏这种整合。亚组分析显示,产妇年龄≥30岁、孕周≥28.4周、解剖结构特征高于平均水平的亚组auc明显改善。结论SSRN在产前CH风险识别方面表现优异,突出了自我监督学习和解剖学特征在产前CH筛查中的创新应用。此外,SSRN显示出可能成为高危妊娠中识别CH的有价值的补充工具,从而促进个性化产前管理。
{"title":"Self-supervised learning radiomics nomogram integrating anatomical structures can identify cerebellar hypoplasia in prenatal ultrasound","authors":"Ruifan He,&nbsp;Yiling Ma,&nbsp;Xiaoxiao Wu,&nbsp;Fu Liu,&nbsp;Liuyue Li,&nbsp;Tongquan Wu,&nbsp;Guoping Xu,&nbsp;Chen Cheng,&nbsp;Sheng Zhao,&nbsp;Xinglong Wu","doi":"10.1007/s10489-025-06986-1","DOIUrl":"10.1007/s10489-025-06986-1","url":null,"abstract":"&lt;div&gt;&lt;p&gt;Background The early and accurate identification of fetal cerebellar hypoplasia (CH) during the prenatal stage is crucial for timely intervention and decision-making. Medical ultrasound represents the primary tool for CH diagnosis. However, the accuracy of CH diagnosis may be limited by imaging artifacts and subjective judgments among sonographers. Artificial intelligence provides an effective tool for improving the diagnostic accuracy and consistency of ultrasound imaging. Objective This study aims to develop and validate a self-supervised learning radiomics nomogram (SSRN) that integrates the anatomical structures of the fetal skull, cerebellum, and cistern in order to assess prenatal risk for fetal CH, to identify significant factors that may influence CH diagnosis, and to evaluate the diagnostic efficacy of SSRN in clinical applications. Method This retrospective study included clinical data and ultrasound images from 547 normal fetuses and 301 fetuses diagnosed with CH between September 2019 and September 2023 at the Ultrasound Diagnostic Department of Hubei Maternal and Child Health Hospital, China. Subsequently, the standard brain views were selected by experienced sonographers, who also delineated the contours of the skull, cerebellum, and cistern in each view. In the self-supervised learning strategy, VQ-VAE-2 was employed to extract the latent features from these brain views, which were then converted into a self-supervised image score (SIS). Radiomics features were extracted from the combined region of interest (ROI) of the cerebellum and cistern to obtain a radiomics score (RS). The proposed SSRN was constructed by integrating significant demographic and morphological features identified through univariate and multivariate logistic regression analyses, along with SIS and RS. In order to validate the clinical potential of SSRN, several comparative models were established, including an expert model (EM) comprising three sonographers with varying years of clinical experience, a clinical regression model (CLM) based on clinical data, a self-supervised learning classification model (SSL-CM) based on latent features extracted by VQ-VAE-2, and a radiomics model (RM) based on radiomics features. Results The study identified several statistically significant influencing factors, including the width of the cistern, the area of the cistern, cerebellum and skull, the area ratio between the cistern and cerebellum, SIS, and RS. Subsequently, SSRN was constructed using the aforementioned factors, achieving an accuracy of 0.906 and an AUC of 0.956. This resulted in a significantly enhanced performance in comparison to alternative models and EM (accuracy: 0.782, AUC: 0.752). Furthermore, it was observed that the integration of anatomical structures in SSRN and CLM (accuracy: 0.894, AUC: 0.934) performed better than RM (accuracy: 0.871, AUC: 0.934) and SSL-CM (accuracy: 0.622, AUC: 0.641), which lacked such incorporation. A subgroup analysis re","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145754274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Personalized dialogue generation through knowledge expansion and in-context learning 通过知识扩展和情境学习生成个性化对话
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-16 DOI: 10.1007/s10489-025-07030-y
Zhewen Wang, Tinghuai Ma, Huan Rong, Li Jia

Personalized dialogue systems represent an innovative application in the field of conversational AI, aiming to endow chatbots with distinct personas to address the lack of individuality and specificity in traditional human-computer inter- actions.Current approaches often fail to incorporate rich external knowledge, making it difficult to maintain coherence and depth in long-term personal- ized interactions.Thus, to bridge the gap between static persona design and dynamic, knowledge-enhanced personalized dialogue generation, we did the fol- lowing work:1) We propose a novel Knowledge-expanded Personalized Dialogue Generation (KPDG) model to extend predefined personas using a commonsense knowledge graph of related personas. During the decoding of generated responses, this method adaptively integrates the most relevant personas that are optimally selected and partitioned with the dialogue history. 2) We design a two-stage prompting approach that leverages large language models (LLMs) for personal- ized dialogue generation. In the first stage, LLMs are used to enhance and expand the persona, enriching the persona’s intrinsic characteristics and emotional state. All in all, the main contributions of this work are as follows: (a) identification of the key limitations of current persona-based dialogue systems and formulation of a knowledge-enhanced framework to address them; (b) a novel persona expan- sion approach that combines structured common sense knowledge with adaptive selection; and (c) a two-stage LLM prompting paradigm that achieves state- of-the-art results without fine-tuning. Experiments on the PERSONA-CHAT dataset demonstrate that our approach outperforms strong baselines in both. 1. automatic metrics and human evaluations, validating its effectiveness in enhanc- ing persona diversity, contextual consistency, and conversational engagement. The novel contribution of this work is that we propose the KPDG framework, which first employs a knowledge graph–driven persona expansion module to enrich persona attributes and adaptively select context-relevant traits during decoding. Then, a two-stage LLM prompting strategy is applied: the first stage enhances and diversifies persona characteristics, while the second stage gener- ates responses via ICL without model retraining. This design not only addresses persona sparsity and consistency issues but also provides a scalable solution adaptable to different dialogue settings.

个性化对话系统是会话人工智能领域的一种创新应用,旨在赋予聊天机器人独特的角色,以解决传统人机交互中缺乏个性和特异性的问题。目前的方法往往不能纳入丰富的外部知识,使其难以在长期的个性化互动中保持连贯性和深度。因此,为了弥合静态人物角色设计和动态、知识增强的个性化对话生成之间的差距,我们做了以下工作:1)我们提出了一种新的知识扩展个性化对话生成(KPDG)模型,使用相关人物角色的常识知识图扩展预定义人物角色。在对生成的响应进行解码的过程中,该方法自适应地将最优选择的最相关的人物角色与对话历史进行整合和分割。2)我们设计了一种两阶段提示方法,利用大型语言模型(llm)生成个性化对话。在第一阶段,法学硕士被用来增强和扩展人物角色,丰富人物角色的内在特征和情感状态。总而言之,这项工作的主要贡献如下:(a)确定当前基于人物的对话系统的主要局限性,并制定一个知识增强框架来解决这些问题;(b)结合结构化常识知识和适应性选择的新型人格扩展方法;(c)两阶段LLM提示范式,无需微调即可实现最先进的结果。在PERSONA-CHAT数据集上的实验表明,我们的方法在这两个方面都优于强基线。1. 自动度量和人工评估,验证其在增强角色多样性、上下文一致性和会话参与方面的有效性。本工作的新颖之处在于我们提出了KPDG框架,该框架首先采用知识图驱动的角色扩展模块来丰富角色属性,并在解码过程中自适应地选择与上下文相关的特征。然后,应用两阶段的LLM提示策略:第一阶段增强和多样化角色特征,而第二阶段通过ICL生成响应,而不需要模型再训练。这种设计不仅解决了角色稀疏性和一致性问题,还提供了可伸缩的解决方案,可适应不同的对话设置。
{"title":"Personalized dialogue generation through knowledge expansion and in-context learning","authors":"Zhewen Wang,&nbsp;Tinghuai Ma,&nbsp;Huan Rong,&nbsp;Li Jia","doi":"10.1007/s10489-025-07030-y","DOIUrl":"10.1007/s10489-025-07030-y","url":null,"abstract":"<div><p>Personalized dialogue systems represent an innovative application in the field of conversational AI, aiming to endow chatbots with distinct personas to address the lack of individuality and specificity in traditional human-computer inter- actions.Current approaches often fail to incorporate rich external knowledge, making it difficult to maintain coherence and depth in long-term personal- ized interactions.Thus, to bridge the gap between static persona design and dynamic, knowledge-enhanced personalized dialogue generation, we did the fol- lowing work:1) We propose a novel Knowledge-expanded Personalized Dialogue Generation (KPDG) model to extend predefined personas using a commonsense knowledge graph of related personas. During the decoding of generated responses, this method adaptively integrates the most relevant personas that are optimally selected and partitioned with the dialogue history. 2) We design a two-stage prompting approach that leverages large language models (LLMs) for personal- ized dialogue generation. In the first stage, LLMs are used to enhance and expand the persona, enriching the persona’s intrinsic characteristics and emotional state. All in all, the main contributions of this work are as follows: (a) identification of the key limitations of current persona-based dialogue systems and formulation of a knowledge-enhanced framework to address them; (b) a novel persona expan- sion approach that combines structured common sense knowledge with adaptive selection; and (c) a two-stage LLM prompting paradigm that achieves state- of-the-art results without fine-tuning. Experiments on the PERSONA-CHAT dataset demonstrate that our approach outperforms strong baselines in both. 1. automatic metrics and human evaluations, validating its effectiveness in enhanc- ing persona diversity, contextual consistency, and conversational engagement. The novel contribution of this work is that we propose the KPDG framework, which first employs a knowledge graph–driven persona expansion module to enrich persona attributes and adaptively select context-relevant traits during decoding. Then, a two-stage LLM prompting strategy is applied: the first stage enhances and diversifies persona characteristics, while the second stage gener- ates responses via ICL without model retraining. This design not only addresses persona sparsity and consistency issues but also provides a scalable solution adaptable to different dialogue settings.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145754356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CARM-Diff: a cross-attention guided diffusion model for game trajectory prediction CARM-Diff:一个用于博弈轨迹预测的交叉注意引导扩散模型
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-15 DOI: 10.1007/s10489-025-07019-7
Zhiheng Zhang, Lina Lu, Chushu Yi, Tingting Wei, Jing Chen

In real decision-making scenarios involving multi-agent games, such as intelligent unmanned systems, military confrontations, and autonomous navigation, the coupling of participant strategies and the incompleteness of perceived information make the accurate inference of dynamic game trajectories a critical and challenging task. To address this problem, this paper proposes a game trajectory modeling method that integrates a cross-attention mechanism with a moving diffusion process, termed CARM-Diff (Cross-Attention Autoregressive Moving Diffusion Model). The model combines autoregressive structures with cross-attention to capture the temporal evolution of sequences while explicitly modeling strategic interactions between agents. We design a lightweight feature extraction module and, leveraging the Markov property of diffusion models, introduce a deterministic evolution process of historical states to simulate noise, thereby enhancing the model’s capability to learn local temporal patterns. Meanwhile, a cross-attention mechanism is introduced in the reverse diffusion stage to guide the model in focusing on the opponent’s historical sequential behavior, enabling more precise capture of inter-agent influences. Furthermore, we design a residual gated trajectory modeling structure that fuses the agent’s own behavioral evolution with interaction effects induced by opponents. Gating factors are dynamically generated through multilayer perceptrons to achieve adaptive information fusion. We construct a dynamic trajectory dataset based on an underwater pursuit-evasion game to validate our approach, and the proposed CARM Diff framework is generalizable to a wide range of multi-agent interactive systems. Experimental results show that CARM-Diff outperforms mainstream baseline methods in both prediction accuracy and dynamic interaction modeling, demonstrating the effectiveness and practical potential of the proposed model.

在智能无人系统、军事对抗和自主导航等涉及多智能体博弈的真实决策场景中,参与者策略的耦合和感知信息的不完全性使得动态博弈轨迹的准确推断成为一项关键而具有挑战性的任务。为了解决这一问题,本文提出了一种将交叉注意机制与移动扩散过程相结合的博弈轨迹建模方法,称为CARM-Diff(交叉注意自回归移动扩散模型)。该模型将自回归结构与交叉注意相结合,以捕捉序列的时间演变,同时显式地建模代理之间的策略交互。我们设计了一个轻量级的特征提取模块,利用扩散模型的马尔可夫特性,引入历史状态的确定性演化过程来模拟噪声,从而增强模型学习局部时间模式的能力。同时,在逆向扩散阶段引入交叉注意机制,引导模型关注对手的历史顺序行为,从而更精确地捕捉agent间的影响。此外,我们设计了一个残差门控轨迹建模结构,融合了智能体自身的行为进化和对手诱导的交互效应。通过多层感知器动态生成门控因子,实现自适应信息融合。我们构建了一个基于水下追逐-逃避博弈的动态轨迹数据集来验证我们的方法,并且所提出的CARM Diff框架可推广到广泛的多智能体交互系统。实验结果表明,CARM-Diff在预测精度和动态交互建模方面均优于主流基线方法,证明了该模型的有效性和应用潜力。
{"title":"CARM-Diff: a cross-attention guided diffusion model for game trajectory prediction","authors":"Zhiheng Zhang,&nbsp;Lina Lu,&nbsp;Chushu Yi,&nbsp;Tingting Wei,&nbsp;Jing Chen","doi":"10.1007/s10489-025-07019-7","DOIUrl":"10.1007/s10489-025-07019-7","url":null,"abstract":"<div><p>In real decision-making scenarios involving multi-agent games, such as intelligent unmanned systems, military confrontations, and autonomous navigation, the coupling of participant strategies and the incompleteness of perceived information make the accurate inference of dynamic game trajectories a critical and challenging task. To address this problem, this paper proposes a game trajectory modeling method that integrates a cross-attention mechanism with a moving diffusion process, termed CARM-Diff (Cross-Attention Autoregressive Moving Diffusion Model). The model combines autoregressive structures with cross-attention to capture the temporal evolution of sequences while explicitly modeling strategic interactions between agents. We design a lightweight feature extraction module and, leveraging the Markov property of diffusion models, introduce a deterministic evolution process of historical states to simulate noise, thereby enhancing the model’s capability to learn local temporal patterns. Meanwhile, a cross-attention mechanism is introduced in the reverse diffusion stage to guide the model in focusing on the opponent’s historical sequential behavior, enabling more precise capture of inter-agent influences. Furthermore, we design a residual gated trajectory modeling structure that fuses the agent’s own behavioral evolution with interaction effects induced by opponents. Gating factors are dynamically generated through multilayer perceptrons to achieve adaptive information fusion. We construct a dynamic trajectory dataset based on an underwater pursuit-evasion game to validate our approach, and the proposed CARM Diff framework is generalizable to a wide range of multi-agent interactive systems. Experimental results show that CARM-Diff outperforms mainstream baseline methods in both prediction accuracy and dynamic interaction modeling, demonstrating the effectiveness and practical potential of the proposed model.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 18","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-025-07019-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1