Pub Date : 2025-12-18DOI: 10.1007/s10489-025-07034-8
Meiru Wang, Sheng Fang, Yunfan Li, Xingli Zhang, Zhe Li
With the rapid advancement of deep learning, remote sensing image change detection (CD) has made significant progress. The complementary strengths of convolutional neural networks (CNNs) and Transformers have attracted considerable attention. This has prompted researchers to explore CNN-Transformer parallel architectures for CD tasks. However, existing methods often rely on spatial-domain operations, such as convolution and pooling, to integrate local and global features, which can result in the loss of fine-grained details, leading to incomplete detection of small changes and blurred boundaries in change regions. Additionally, many methods employ a single strategy for difference extraction, limiting their ability to model temporal dependencies and making them susceptible to irrelevant environmental variations, which can cause pseudo-changes. To address these challenges, we propose a Hybrid CNN-Transformer Network with Difference Enhancement and Frequency Fusion (HCTFNet). HCTFNet introduces a Temporal Feature Fusion Module (TFFM) that efficiently extracts difference features via a dual-branch operation, while integrating an attention mechanism to highlight actual changes and suppress irrelevant noise. Furthermore, the Hybrid CNN-Transformer Fusion (HCTF) module extracts both local and global features and applies frequency-domain processing to enhance the interaction between local and global features, thereby preserving fine-grained spatial details more effectively. Extensive experiments conducted on three publicly available benchmark datasets demonstrate that HCTFNet achieves superior CD performance compared to existing mainstream methods.
{"title":"A hybrid CNN-transformer network with difference enhancement and frequency fusion for remote sensing image change detection","authors":"Meiru Wang, Sheng Fang, Yunfan Li, Xingli Zhang, Zhe Li","doi":"10.1007/s10489-025-07034-8","DOIUrl":"10.1007/s10489-025-07034-8","url":null,"abstract":"<div><p>With the rapid advancement of deep learning, remote sensing image change detection (CD) has made significant progress. The complementary strengths of convolutional neural networks (CNNs) and Transformers have attracted considerable attention. This has prompted researchers to explore CNN-Transformer parallel architectures for CD tasks. However, existing methods often rely on spatial-domain operations, such as convolution and pooling, to integrate local and global features, which can result in the loss of fine-grained details, leading to incomplete detection of small changes and blurred boundaries in change regions. Additionally, many methods employ a single strategy for difference extraction, limiting their ability to model temporal dependencies and making them susceptible to irrelevant environmental variations, which can cause pseudo-changes. To address these challenges, we propose a Hybrid CNN-Transformer Network with Difference Enhancement and Frequency Fusion (HCTFNet). HCTFNet introduces a Temporal Feature Fusion Module (TFFM) that efficiently extracts difference features via a dual-branch operation, while integrating an attention mechanism to highlight actual changes and suppress irrelevant noise. Furthermore, the Hybrid CNN-Transformer Fusion (HCTF) module extracts both local and global features and applies frequency-domain processing to enhance the interaction between local and global features, thereby preserving fine-grained spatial details more effectively. Extensive experiments conducted on three publicly available benchmark datasets demonstrate that HCTFNet achieves superior CD performance compared to existing mainstream methods.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-18DOI: 10.1007/s10489-025-06849-9
Mukun Cao, Bing Li
In this paper, we investigate optimization strategies for predicting high-mortality diseases in response to the challenges of unstable data quality and missing labels in healthcare big data. Firstly, a systematic review of the current state of research and application of machine learning algorithms for predicting heart disease, cardiovascular and cerebrovascular disease, lung cancer, diabetes, and breast cancer is presented. The core contributions are: 1) Semi-supervised self-training experiments were conducted on 14 datasets with labeling ratios of 30(%)-95(%), and the performance of classifiers such as SVM, KNN, LR, AB, DT, and RF was evaluated, and it was found that data balance is crucial for the accuracy improvement of most of the classifiers, and that the sensitivities to labeling ratios varied significantly between different classifiers, e.g., SVM robustness and KNN dependency. 2) In-depth analysis of feature importance identifies the key attributes of each dataset, whose absence significantly degrades the model performance, while RF exhibits optimal robustness due to integration properties. The study provides methodological references and empirical evidence for precision medicine prediction under limited labeling conditions.
{"title":"A review of predictive healthcare treatment based on text mining","authors":"Mukun Cao, Bing Li","doi":"10.1007/s10489-025-06849-9","DOIUrl":"10.1007/s10489-025-06849-9","url":null,"abstract":"<div><p>In this paper, we investigate optimization strategies for predicting high-mortality diseases in response to the challenges of unstable data quality and missing labels in healthcare big data. Firstly, a systematic review of the current state of research and application of machine learning algorithms for predicting heart disease, cardiovascular and cerebrovascular disease, lung cancer, diabetes, and breast cancer is presented. The core contributions are: 1) Semi-supervised self-training experiments were conducted on 14 datasets with labeling ratios of 30<span>(%)</span>-95<span>(%)</span>, and the performance of classifiers such as SVM, KNN, LR, AB, DT, and RF was evaluated, and it was found that data balance is crucial for the accuracy improvement of most of the classifiers, and that the sensitivities to labeling ratios varied significantly between different classifiers, e.g., SVM robustness and KNN dependency. 2) In-depth analysis of feature importance identifies the key attributes of each dataset, whose absence significantly degrades the model performance, while RF exhibits optimal robustness due to integration properties. The study provides methodological references and empirical evidence for precision medicine prediction under limited labeling conditions.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-18DOI: 10.1007/s10489-025-06274-y
Ningbo Huang, Gang Zhou, Meng Zhang, Yi Xia, Shunhang Li
Graph meta-learning models can fast adapt to new tasks with extremely limited labeled data by learning transferable meta knowledge and inductive bias on graph. Existing methods construct meta-training tasks with abundant labeled nodes from base classes, which limit the application scenarios of graph meta-learning. Therefore, we propose an unsupervised graph meta-learning framework via local subgraph augmentation (UMLGA). Specifically, we firstly propose a graph clustering-based sampling method to sample anchor nodes from different natural classes and extract corresponding local subgraphs. Then, supposing that the generated augmentation samples share the same labels, we design structure-wise and feature-wise graph augmentation strategies to generate diverse augmentation subgraphs while keeping the semantics unchanged. Finally, we perform meta-training on the unsupervised constructed tasks with weighted meta-loss, which can extract cross-tasks knowledge for fast adaption to novel classes. To evaluate the effectiveness of UMLGA, series of experiments are conducted on four real-world graph datasets. Experiment results show that, even without relying on extensive labeled data, UMLGA can achieve comparable and even better few-shot node classification performance comparing with the supervised graph meta-learning backbone models. With GPN as the backbone model, the improvements of UMLGA are respectively 3.0(sim )9.3%, 4.4(sim )11.6%, -1.2(sim )9.3%, and 1.8(sim )15.1% on Amazon-Clothing, Amazon-Electronics, DBLP, and ogbn-products datasets.
图元学习模型通过学习可转移元知识和图上的归纳偏差,可以快速适应标注数据极其有限的新任务。现有方法从基类中构造具有大量标记节点的元训练任务,限制了图元学习的应用场景。因此,我们提出了一种基于局部子图增强(UMLGA)的无监督图元学习框架。具体而言,我们首先提出了一种基于图聚类的采样方法,从不同的自然类别中采样锚节点并提取相应的局部子图。然后,假设生成的增强样本具有相同的标签,我们设计了结构型和特征型图增强策略,以在保持语义不变的情况下生成不同的增强子图。最后,利用加权元损失对无监督构造任务进行元训练,提取跨任务知识,快速适应新类。为了评估UMLGA的有效性,在4个真实的图数据集上进行了一系列的实验。实验结果表明,即使不依赖于大量的标记数据,与监督图元学习骨干模型相比,UMLGA也能获得相当甚至更好的少射节点分类性能。以GPN为骨干模型时,UMLGA的改进率分别为3.0 (sim ) 9.3%, 4.4(sim )11.6%, -1.2(sim )9.3%, and 1.8(sim )15.1% on Amazon-Clothing, Amazon-Electronics, DBLP, and ogbn-products datasets.
{"title":"UMLGA: unsupervised graph meta-learning via local subgraph augmentation","authors":"Ningbo Huang, Gang Zhou, Meng Zhang, Yi Xia, Shunhang Li","doi":"10.1007/s10489-025-06274-y","DOIUrl":"10.1007/s10489-025-06274-y","url":null,"abstract":"<div><p>Graph meta-learning models can fast adapt to new tasks with extremely limited labeled data by learning transferable meta knowledge and inductive bias on graph. Existing methods construct meta-training tasks with abundant labeled nodes from base classes, which limit the application scenarios of graph meta-learning. Therefore, we propose an unsupervised graph meta-learning framework via local subgraph augmentation (<b>UMLGA</b>). Specifically, we firstly propose a graph clustering-based sampling method to sample anchor nodes from different natural classes and extract corresponding local subgraphs. Then, supposing that the generated augmentation samples share the same labels, we design structure-wise and feature-wise graph augmentation strategies to generate diverse augmentation subgraphs while keeping the semantics unchanged. Finally, we perform meta-training on the unsupervised constructed tasks with weighted meta-loss, which can extract cross-tasks knowledge for fast adaption to novel classes. To evaluate the effectiveness of <b>UMLGA</b>, series of experiments are conducted on four real-world graph datasets. Experiment results show that, even without relying on extensive labeled data, <b>UMLGA</b> can achieve comparable and even better few-shot node classification performance comparing with the supervised graph meta-learning backbone models. With GPN as the backbone model, the improvements of <b>UMLGA</b> are respectively 3.0<span>(sim )</span>9.3%, 4.4<span>(sim )</span>11.6%, -1.2<span>(sim )</span>9.3%, and 1.8<span>(sim )</span>15.1% on Amazon-Clothing, Amazon-Electronics, DBLP, and ogbn-products datasets.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17DOI: 10.1007/s10489-025-07029-5
Md Sabuj Khan, Ting Zhong, Hengjian Li, Fan Zhou
Multimodal biometric systems provide advantages over unimodal systems, such as improved accuracy, spoofing resistance, and broader population coverage. However, challenges related to privacy and template protection remain. To address these issues, we propose a integrated framework for multimodal biometric recognition with cancelable template protection, a privacy-preserving mechanism, using deep learning and an optimized bloom filter to significantly enhance recognition performance while ensuring robust security and privacy. First, a key mapping and management system is designed to generate secure keys that support the entire framework. Second, the Dynamic Attention and Hash Network (DAHNet) is employed to extract discriminative palmprint features through a hybrid attention mechanism and a deep hashing network. Third, a quantized fingerprint feature mapping technique is used to generate the corresponding binary fingerprint vector. Finally, the system applies an XOR operation to fuse DAHNet-extracted palmprint features and quantized fingerprint features, followed by an optimized bloom filter to generate secure cancelable templates, ensuring cancelability, irreversibility, and protection against template reconstruction. Experimental evaluations on the TJU and PolyU palmprint datasets, as well as the FVC2002 fingerprint dataset, demonstrate the outstanding accuracy of our state-of-the-art approach, achieving a remarkably low Equal Error Rate (EER). Comparative analysis further shows that the proposed multimodal system significantly outperforms unimodal systems in both recognition accuracy and security. Moreover, security analysis confirms that the framework satisfies all critical requirements for cancelable biometric template protection, including irreversibility, unlinkability, revocability, and robust privacy against various attack scenarios.
{"title":"Multimodal biometric recognition with cancelable template protection using deep learning and an optimized bloom filter","authors":"Md Sabuj Khan, Ting Zhong, Hengjian Li, Fan Zhou","doi":"10.1007/s10489-025-07029-5","DOIUrl":"10.1007/s10489-025-07029-5","url":null,"abstract":"<div><p>Multimodal biometric systems provide advantages over unimodal systems, such as improved accuracy, spoofing resistance, and broader population coverage. However, challenges related to privacy and template protection remain. To address these issues, we propose a integrated framework for multimodal biometric recognition with cancelable template protection, a privacy-preserving mechanism, using deep learning and an optimized bloom filter to significantly enhance recognition performance while ensuring robust security and privacy. First, a key mapping and management system is designed to generate secure keys that support the entire framework. Second, the Dynamic Attention and Hash Network (DAHNet) is employed to extract discriminative palmprint features through a hybrid attention mechanism and a deep hashing network. Third, a quantized fingerprint feature mapping technique is used to generate the corresponding binary fingerprint vector. Finally, the system applies an XOR operation to fuse DAHNet-extracted palmprint features and quantized fingerprint features, followed by an optimized bloom filter to generate secure cancelable templates, ensuring cancelability, irreversibility, and protection against template reconstruction. Experimental evaluations on the TJU and PolyU palmprint datasets, as well as the FVC2002 fingerprint dataset, demonstrate the outstanding accuracy of our state-of-the-art approach, achieving a remarkably low Equal Error Rate (EER). Comparative analysis further shows that the proposed multimodal system significantly outperforms unimodal systems in both recognition accuracy and security. Moreover, security analysis confirms that the framework satisfies all critical requirements for cancelable biometric template protection, including irreversibility, unlinkability, revocability, and robust privacy against various attack scenarios.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17DOI: 10.1007/s10489-025-07033-9
Wenjie Liu, Sen Wang
Time series forecasting has broad applications in fields such as finance, energy management, and weather prediction. PatchTST, a widely recognized model, has significantly improved the input length and accuracy of time series predictions. However, it uniformly treats all patches without considering their varying impacts on future predictions and relies heavily on global attention, often neglecting local information. To address these issues, we have developed an enhanced model: Dynamic Feature Weighting PatchTST (DynamicPatchTST). This model features three key enhancements: (i) a dynamic feature weighting strategy that assigns weights to each patch based on the characteristics of individual time series, emphasizing more impactful patches; (ii) a linear embedding method to replace positional encoding, preventing the over-adjustment or redundant modification of patch weights; (iii) a dual-pathway strategy that integrates local and global information, with the local pathway refined through gated adaptive adjustments to enhance the model’s ability to capture local details. Extensive experiments across multiple standard time series datasets (e.g., ETT, Exchanges, Weather) have demonstrated that our DynamicPatchTST significantly outperforms existing state-of-the-art models, with an average improvement of 4.4% in Mean Squared Error (MSE). This work provides a novel perspective on time series forecasting and paves the way for future advancements in the field.
{"title":"A dual-pathway transformer utilizing dynamic weighting strategy for time series forecasting","authors":"Wenjie Liu, Sen Wang","doi":"10.1007/s10489-025-07033-9","DOIUrl":"10.1007/s10489-025-07033-9","url":null,"abstract":"<div><p>Time series forecasting has broad applications in fields such as finance, energy management, and weather prediction. PatchTST, a widely recognized model, has significantly improved the input length and accuracy of time series predictions. However, it uniformly treats all patches without considering their varying impacts on future predictions and relies heavily on global attention, often neglecting local information. To address these issues, we have developed an enhanced model: Dynamic Feature Weighting PatchTST (DynamicPatchTST). This model features three key enhancements: (i) a dynamic feature weighting strategy that assigns weights to each patch based on the characteristics of individual time series, emphasizing more impactful patches; (ii) a linear embedding method to replace positional encoding, preventing the over-adjustment or redundant modification of patch weights; (iii) a dual-pathway strategy that integrates local and global information, with the local pathway refined through gated adaptive adjustments to enhance the model’s ability to capture local details. Extensive experiments across multiple standard time series datasets (e.g., ETT, Exchanges, Weather) have demonstrated that our DynamicPatchTST significantly outperforms existing state-of-the-art models, with an average improvement of 4.4% in Mean Squared Error (MSE). This work provides a novel perspective on time series forecasting and paves the way for future advancements in the field.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-16DOI: 10.1007/s10489-025-07039-3
Gui Chen, Xianhui Liu, Wenlong Hou, Qiujun Deng
Due to the complementary strengths of language models and knowledge graphs, question-answering systems that combine both techniques have emerged. However, existing models suffer from issues such as introducing excessive misleading information and an unreasonable distribution of heterogeneous representations, which negatively affect answer accuracy and reasoning efficiency. To address these problems, we propose a question-answering model based on split-hop information propagation and heterogeneous representation alignment. First, we train a graph attention network using the split-hop information propagation method along multi-hop reasoning paths, enhancing the relevance between extracted sub-graphs and the question context while allowing the network to learn differences across reasoning hops. Next, we align the heterogeneous representations of text and knowledge graph through post-training, ensuring the pre-trained representations fall into a reasonable parameter distribution. We validate the effectiveness and generalizability of our model on the CommonsenseQA and OpenBookQA datasets in the commonsense domain, as well as the MedQA-UMSLE dataset in the biomedical domain, achieving accuracy improvements of 0.5%, 0.6%, and 1.2% over baseline models of the same type, respectively.
{"title":"A joint question-answering model based on split-hop information propagation and heterogeneous representation alignment","authors":"Gui Chen, Xianhui Liu, Wenlong Hou, Qiujun Deng","doi":"10.1007/s10489-025-07039-3","DOIUrl":"10.1007/s10489-025-07039-3","url":null,"abstract":"<div><p>Due to the complementary strengths of language models and knowledge graphs, question-answering systems that combine both techniques have emerged. However, existing models suffer from issues such as introducing excessive misleading information and an unreasonable distribution of heterogeneous representations, which negatively affect answer accuracy and reasoning efficiency. To address these problems, we propose a question-answering model based on split-hop information propagation and heterogeneous representation alignment. First, we train a graph attention network using the split-hop information propagation method along multi-hop reasoning paths, enhancing the relevance between extracted sub-graphs and the question context while allowing the network to learn differences across reasoning hops. Next, we align the heterogeneous representations of text and knowledge graph through post-training, ensuring the pre-trained representations fall into a reasonable parameter distribution. We validate the effectiveness and generalizability of our model on the CommonsenseQA and OpenBookQA datasets in the commonsense domain, as well as the MedQA-UMSLE dataset in the biomedical domain, achieving accuracy improvements of 0.5%, 0.6%, and 1.2% over baseline models of the same type, respectively.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145754355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-16DOI: 10.1007/s10489-025-07036-6
Yanfei Guo, Hangli Du, Zhenhua Zhang, Yuncui Wang, Dapeng Li, Dengwang Li
Diabetic retinopathy (DR) is one of the most common ocular complications in diabetic patients. Therefore, early DR screening is crucial for preventing disease deterioration and timely diagnosis. However, the challenge of DR grading arose due to the tiny and unevenly distributed lesions and the complex pathological relationships, which are difficult to capture. This paper proposes the pathological relationship feature perception and dual attention guided network (PRANet) for DR grading. Firstly, a pathological relationship feature perception module (PRFM) is introduced to capture the pathological relationships between different types of lesions. A lesion detection network is used to locate the rough regions of the lesions, and K-Means clustering is performed on the lesion features to generate lesion nodes. The co-occurrence relationships between lesion nodes are used to construct an adjacency matrix, which is then input into a graph convolutional network to explore the complex relationships among the lesion nodes. Moreover, the EfficientNetV2-M backbone is employed to obtain global information of DR images and input them into a dual attention module (DAM). The DAM utilizes spatial and channel attention to emphasize the features of suspicious lesion regions. Pathological relationship features, channel features and spatial features are integrated to achieve precise DR grading. Extensive experiments were carried out on the DDR, APTOS2019 and FGADR datasets. Results show that our method exceeds most existing approaches in classification accuracy and robustness, yielding superior classification performance.
{"title":"PRANet: Pathological relationship perception and dual attention guided network for diabetic retinopathy grading","authors":"Yanfei Guo, Hangli Du, Zhenhua Zhang, Yuncui Wang, Dapeng Li, Dengwang Li","doi":"10.1007/s10489-025-07036-6","DOIUrl":"10.1007/s10489-025-07036-6","url":null,"abstract":"<div><p>Diabetic retinopathy (DR) is one of the most common ocular complications in diabetic patients. Therefore, early DR screening is crucial for preventing disease deterioration and timely diagnosis. However, the challenge of DR grading arose due to the tiny and unevenly distributed lesions and the complex pathological relationships, which are difficult to capture. This paper proposes the pathological relationship feature perception and dual attention guided network (PRANet) for DR grading. Firstly, a pathological relationship feature perception module (PRFM) is introduced to capture the pathological relationships between different types of lesions. A lesion detection network is used to locate the rough regions of the lesions, and K-Means clustering is performed on the lesion features to generate lesion nodes. The co-occurrence relationships between lesion nodes are used to construct an adjacency matrix, which is then input into a graph convolutional network to explore the complex relationships among the lesion nodes. Moreover, the EfficientNetV2-M backbone is employed to obtain global information of DR images and input them into a dual attention module (DAM). The DAM utilizes spatial and channel attention to emphasize the features of suspicious lesion regions. Pathological relationship features, channel features and spatial features are integrated to achieve precise DR grading. Extensive experiments were carried out on the DDR, APTOS2019 and FGADR datasets. Results show that our method exceeds most existing approaches in classification accuracy and robustness, yielding superior classification performance.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145754357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<div><p>Background The early and accurate identification of fetal cerebellar hypoplasia (CH) during the prenatal stage is crucial for timely intervention and decision-making. Medical ultrasound represents the primary tool for CH diagnosis. However, the accuracy of CH diagnosis may be limited by imaging artifacts and subjective judgments among sonographers. Artificial intelligence provides an effective tool for improving the diagnostic accuracy and consistency of ultrasound imaging. Objective This study aims to develop and validate a self-supervised learning radiomics nomogram (SSRN) that integrates the anatomical structures of the fetal skull, cerebellum, and cistern in order to assess prenatal risk for fetal CH, to identify significant factors that may influence CH diagnosis, and to evaluate the diagnostic efficacy of SSRN in clinical applications. Method This retrospective study included clinical data and ultrasound images from 547 normal fetuses and 301 fetuses diagnosed with CH between September 2019 and September 2023 at the Ultrasound Diagnostic Department of Hubei Maternal and Child Health Hospital, China. Subsequently, the standard brain views were selected by experienced sonographers, who also delineated the contours of the skull, cerebellum, and cistern in each view. In the self-supervised learning strategy, VQ-VAE-2 was employed to extract the latent features from these brain views, which were then converted into a self-supervised image score (SIS). Radiomics features were extracted from the combined region of interest (ROI) of the cerebellum and cistern to obtain a radiomics score (RS). The proposed SSRN was constructed by integrating significant demographic and morphological features identified through univariate and multivariate logistic regression analyses, along with SIS and RS. In order to validate the clinical potential of SSRN, several comparative models were established, including an expert model (EM) comprising three sonographers with varying years of clinical experience, a clinical regression model (CLM) based on clinical data, a self-supervised learning classification model (SSL-CM) based on latent features extracted by VQ-VAE-2, and a radiomics model (RM) based on radiomics features. Results The study identified several statistically significant influencing factors, including the width of the cistern, the area of the cistern, cerebellum and skull, the area ratio between the cistern and cerebellum, SIS, and RS. Subsequently, SSRN was constructed using the aforementioned factors, achieving an accuracy of 0.906 and an AUC of 0.956. This resulted in a significantly enhanced performance in comparison to alternative models and EM (accuracy: 0.782, AUC: 0.752). Furthermore, it was observed that the integration of anatomical structures in SSRN and CLM (accuracy: 0.894, AUC: 0.934) performed better than RM (accuracy: 0.871, AUC: 0.934) and SSL-CM (accuracy: 0.622, AUC: 0.641), which lacked such incorporation. A subgroup analysis re
{"title":"Self-supervised learning radiomics nomogram integrating anatomical structures can identify cerebellar hypoplasia in prenatal ultrasound","authors":"Ruifan He, Yiling Ma, Xiaoxiao Wu, Fu Liu, Liuyue Li, Tongquan Wu, Guoping Xu, Chen Cheng, Sheng Zhao, Xinglong Wu","doi":"10.1007/s10489-025-06986-1","DOIUrl":"10.1007/s10489-025-06986-1","url":null,"abstract":"<div><p>Background The early and accurate identification of fetal cerebellar hypoplasia (CH) during the prenatal stage is crucial for timely intervention and decision-making. Medical ultrasound represents the primary tool for CH diagnosis. However, the accuracy of CH diagnosis may be limited by imaging artifacts and subjective judgments among sonographers. Artificial intelligence provides an effective tool for improving the diagnostic accuracy and consistency of ultrasound imaging. Objective This study aims to develop and validate a self-supervised learning radiomics nomogram (SSRN) that integrates the anatomical structures of the fetal skull, cerebellum, and cistern in order to assess prenatal risk for fetal CH, to identify significant factors that may influence CH diagnosis, and to evaluate the diagnostic efficacy of SSRN in clinical applications. Method This retrospective study included clinical data and ultrasound images from 547 normal fetuses and 301 fetuses diagnosed with CH between September 2019 and September 2023 at the Ultrasound Diagnostic Department of Hubei Maternal and Child Health Hospital, China. Subsequently, the standard brain views were selected by experienced sonographers, who also delineated the contours of the skull, cerebellum, and cistern in each view. In the self-supervised learning strategy, VQ-VAE-2 was employed to extract the latent features from these brain views, which were then converted into a self-supervised image score (SIS). Radiomics features were extracted from the combined region of interest (ROI) of the cerebellum and cistern to obtain a radiomics score (RS). The proposed SSRN was constructed by integrating significant demographic and morphological features identified through univariate and multivariate logistic regression analyses, along with SIS and RS. In order to validate the clinical potential of SSRN, several comparative models were established, including an expert model (EM) comprising three sonographers with varying years of clinical experience, a clinical regression model (CLM) based on clinical data, a self-supervised learning classification model (SSL-CM) based on latent features extracted by VQ-VAE-2, and a radiomics model (RM) based on radiomics features. Results The study identified several statistically significant influencing factors, including the width of the cistern, the area of the cistern, cerebellum and skull, the area ratio between the cistern and cerebellum, SIS, and RS. Subsequently, SSRN was constructed using the aforementioned factors, achieving an accuracy of 0.906 and an AUC of 0.956. This resulted in a significantly enhanced performance in comparison to alternative models and EM (accuracy: 0.782, AUC: 0.752). Furthermore, it was observed that the integration of anatomical structures in SSRN and CLM (accuracy: 0.894, AUC: 0.934) performed better than RM (accuracy: 0.871, AUC: 0.934) and SSL-CM (accuracy: 0.622, AUC: 0.641), which lacked such incorporation. A subgroup analysis re","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145754274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-16DOI: 10.1007/s10489-025-07030-y
Zhewen Wang, Tinghuai Ma, Huan Rong, Li Jia
Personalized dialogue systems represent an innovative application in the field of conversational AI, aiming to endow chatbots with distinct personas to address the lack of individuality and specificity in traditional human-computer inter- actions.Current approaches often fail to incorporate rich external knowledge, making it difficult to maintain coherence and depth in long-term personal- ized interactions.Thus, to bridge the gap between static persona design and dynamic, knowledge-enhanced personalized dialogue generation, we did the fol- lowing work:1) We propose a novel Knowledge-expanded Personalized Dialogue Generation (KPDG) model to extend predefined personas using a commonsense knowledge graph of related personas. During the decoding of generated responses, this method adaptively integrates the most relevant personas that are optimally selected and partitioned with the dialogue history. 2) We design a two-stage prompting approach that leverages large language models (LLMs) for personal- ized dialogue generation. In the first stage, LLMs are used to enhance and expand the persona, enriching the persona’s intrinsic characteristics and emotional state. All in all, the main contributions of this work are as follows: (a) identification of the key limitations of current persona-based dialogue systems and formulation of a knowledge-enhanced framework to address them; (b) a novel persona expan- sion approach that combines structured common sense knowledge with adaptive selection; and (c) a two-stage LLM prompting paradigm that achieves state- of-the-art results without fine-tuning. Experiments on the PERSONA-CHAT dataset demonstrate that our approach outperforms strong baselines in both. 1. automatic metrics and human evaluations, validating its effectiveness in enhanc- ing persona diversity, contextual consistency, and conversational engagement. The novel contribution of this work is that we propose the KPDG framework, which first employs a knowledge graph–driven persona expansion module to enrich persona attributes and adaptively select context-relevant traits during decoding. Then, a two-stage LLM prompting strategy is applied: the first stage enhances and diversifies persona characteristics, while the second stage gener- ates responses via ICL without model retraining. This design not only addresses persona sparsity and consistency issues but also provides a scalable solution adaptable to different dialogue settings.
{"title":"Personalized dialogue generation through knowledge expansion and in-context learning","authors":"Zhewen Wang, Tinghuai Ma, Huan Rong, Li Jia","doi":"10.1007/s10489-025-07030-y","DOIUrl":"10.1007/s10489-025-07030-y","url":null,"abstract":"<div><p>Personalized dialogue systems represent an innovative application in the field of conversational AI, aiming to endow chatbots with distinct personas to address the lack of individuality and specificity in traditional human-computer inter- actions.Current approaches often fail to incorporate rich external knowledge, making it difficult to maintain coherence and depth in long-term personal- ized interactions.Thus, to bridge the gap between static persona design and dynamic, knowledge-enhanced personalized dialogue generation, we did the fol- lowing work:1) We propose a novel Knowledge-expanded Personalized Dialogue Generation (KPDG) model to extend predefined personas using a commonsense knowledge graph of related personas. During the decoding of generated responses, this method adaptively integrates the most relevant personas that are optimally selected and partitioned with the dialogue history. 2) We design a two-stage prompting approach that leverages large language models (LLMs) for personal- ized dialogue generation. In the first stage, LLMs are used to enhance and expand the persona, enriching the persona’s intrinsic characteristics and emotional state. All in all, the main contributions of this work are as follows: (a) identification of the key limitations of current persona-based dialogue systems and formulation of a knowledge-enhanced framework to address them; (b) a novel persona expan- sion approach that combines structured common sense knowledge with adaptive selection; and (c) a two-stage LLM prompting paradigm that achieves state- of-the-art results without fine-tuning. Experiments on the PERSONA-CHAT dataset demonstrate that our approach outperforms strong baselines in both. 1. automatic metrics and human evaluations, validating its effectiveness in enhanc- ing persona diversity, contextual consistency, and conversational engagement. The novel contribution of this work is that we propose the KPDG framework, which first employs a knowledge graph–driven persona expansion module to enrich persona attributes and adaptively select context-relevant traits during decoding. Then, a two-stage LLM prompting strategy is applied: the first stage enhances and diversifies persona characteristics, while the second stage gener- ates responses via ICL without model retraining. This design not only addresses persona sparsity and consistency issues but also provides a scalable solution adaptable to different dialogue settings.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145754356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In real decision-making scenarios involving multi-agent games, such as intelligent unmanned systems, military confrontations, and autonomous navigation, the coupling of participant strategies and the incompleteness of perceived information make the accurate inference of dynamic game trajectories a critical and challenging task. To address this problem, this paper proposes a game trajectory modeling method that integrates a cross-attention mechanism with a moving diffusion process, termed CARM-Diff (Cross-Attention Autoregressive Moving Diffusion Model). The model combines autoregressive structures with cross-attention to capture the temporal evolution of sequences while explicitly modeling strategic interactions between agents. We design a lightweight feature extraction module and, leveraging the Markov property of diffusion models, introduce a deterministic evolution process of historical states to simulate noise, thereby enhancing the model’s capability to learn local temporal patterns. Meanwhile, a cross-attention mechanism is introduced in the reverse diffusion stage to guide the model in focusing on the opponent’s historical sequential behavior, enabling more precise capture of inter-agent influences. Furthermore, we design a residual gated trajectory modeling structure that fuses the agent’s own behavioral evolution with interaction effects induced by opponents. Gating factors are dynamically generated through multilayer perceptrons to achieve adaptive information fusion. We construct a dynamic trajectory dataset based on an underwater pursuit-evasion game to validate our approach, and the proposed CARM Diff framework is generalizable to a wide range of multi-agent interactive systems. Experimental results show that CARM-Diff outperforms mainstream baseline methods in both prediction accuracy and dynamic interaction modeling, demonstrating the effectiveness and practical potential of the proposed model.
{"title":"CARM-Diff: a cross-attention guided diffusion model for game trajectory prediction","authors":"Zhiheng Zhang, Lina Lu, Chushu Yi, Tingting Wei, Jing Chen","doi":"10.1007/s10489-025-07019-7","DOIUrl":"10.1007/s10489-025-07019-7","url":null,"abstract":"<div><p>In real decision-making scenarios involving multi-agent games, such as intelligent unmanned systems, military confrontations, and autonomous navigation, the coupling of participant strategies and the incompleteness of perceived information make the accurate inference of dynamic game trajectories a critical and challenging task. To address this problem, this paper proposes a game trajectory modeling method that integrates a cross-attention mechanism with a moving diffusion process, termed CARM-Diff (Cross-Attention Autoregressive Moving Diffusion Model). The model combines autoregressive structures with cross-attention to capture the temporal evolution of sequences while explicitly modeling strategic interactions between agents. We design a lightweight feature extraction module and, leveraging the Markov property of diffusion models, introduce a deterministic evolution process of historical states to simulate noise, thereby enhancing the model’s capability to learn local temporal patterns. Meanwhile, a cross-attention mechanism is introduced in the reverse diffusion stage to guide the model in focusing on the opponent’s historical sequential behavior, enabling more precise capture of inter-agent influences. Furthermore, we design a residual gated trajectory modeling structure that fuses the agent’s own behavioral evolution with interaction effects induced by opponents. Gating factors are dynamically generated through multilayer perceptrons to achieve adaptive information fusion. We construct a dynamic trajectory dataset based on an underwater pursuit-evasion game to validate our approach, and the proposed CARM Diff framework is generalizable to a wide range of multi-agent interactive systems. Experimental results show that CARM-Diff outperforms mainstream baseline methods in both prediction accuracy and dynamic interaction modeling, demonstrating the effectiveness and practical potential of the proposed model.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 18","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-025-07019-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}