Pub Date : 2026-01-09DOI: 10.1016/j.knosys.2026.115265
Zhuofeng Luo , Xinyan Huang , Runkai He , Yaming Yang
Prompt learning has recently shown great promise in enhancing Graph Neural Networks (GNNs) by enabling efficient task adaptation and better generalization. However, existing methods typically employ a single prompt, which restricts their expressive power and adaptability across diverse graph tasks. To overcome this limitation, we propose the Multi-Group Graph Prompt (MGGP) framework, which introduces multiple learnable prompt groups working collaboratively within a GNN to capture diverse semantic patterns and task cues. To effectively integrate the diverse outputs from these prompt groups, we further design an expert-guided aggregation mechanism. This expert module dynamically weighs and combines predictions from each group, acting as a meta-reasoner that selects and integrates information in a task-aware manner, significantly outperforming naive aggregation strategies such as voting or averaging. Extensive experiments on various node and graph classification benchmarks under both full supervision and few-shot settings demonstrate that MGGP achieves superior accuracy and robustness. Our approach empowers existing graph prompt learning methods with multi-perspective reasoning capabilities.
{"title":"General expert-guided multi-group graph prompt learning","authors":"Zhuofeng Luo , Xinyan Huang , Runkai He , Yaming Yang","doi":"10.1016/j.knosys.2026.115265","DOIUrl":"10.1016/j.knosys.2026.115265","url":null,"abstract":"<div><div>Prompt learning has recently shown great promise in enhancing Graph Neural Networks (GNNs) by enabling efficient task adaptation and better generalization. However, existing methods typically employ a single prompt, which restricts their expressive power and adaptability across diverse graph tasks. To overcome this limitation, we propose the Multi-Group Graph Prompt (MGGP) framework, which introduces multiple learnable prompt groups working collaboratively within a GNN to capture diverse semantic patterns and task cues. To effectively integrate the diverse outputs from these prompt groups, we further design an expert-guided aggregation mechanism. This expert module dynamically weighs and combines predictions from each group, acting as a meta-reasoner that selects and integrates information in a task-aware manner, significantly outperforming naive aggregation strategies such as voting or averaging. Extensive experiments on various node and graph classification benchmarks under both full supervision and few-shot settings demonstrate that MGGP achieves superior accuracy and robustness. Our approach empowers existing graph prompt learning methods with multi-perspective reasoning capabilities.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115265"},"PeriodicalIF":7.6,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-09DOI: 10.1016/j.knosys.2026.115270
Qinglin Bi , Lina Ge , Ming Jiang , Lei Tian , Wenbo Lin
Hierarchical federated learning (HFL) has attracted considerable attention owing to its communication efficiency and cost-effectiveness. However, in high-dimensional, large-scale, zero-trust distributed edge networks, HFL is highly susceptible to attacks from malicious nodes across different layers. Most existing federated learning (FL) defense methods focus on two-layer structures and fail to address complex cross-layer attacks in HFL. Security research for HFL is nascent and lacks comprehensive defenses. We address these challenges by proposing HFLMND, a lightweight and robust defense framework designed to enhance the security and resilience of HFL by accurately identifying malicious nodes. In particular, we design a node similarity feature extraction method that extracts multiple features, such as cosine similarity and Euclidean distance, from model parameters submitted by clients and edge servers. Subsequently, we apply a hierarchical clustering strategy based on these features to detect malicious nodes. In addition, we integrate a historical suspicion score correction mechanism that enhances detection accuracy and stability by accumulating historical detection results. Evaluation results indicate that HFLMND effectively detects and defends against various attacks in HFL and achieves an average detection accuracy of > 90% across multiple attacks, with negligible impact on global model performance.
{"title":"HFLMND: Toward robust and efficient hierarchical federated learning via malicious node detection","authors":"Qinglin Bi , Lina Ge , Ming Jiang , Lei Tian , Wenbo Lin","doi":"10.1016/j.knosys.2026.115270","DOIUrl":"10.1016/j.knosys.2026.115270","url":null,"abstract":"<div><div>Hierarchical federated learning (HFL) has attracted considerable attention owing to its communication efficiency and cost-effectiveness. However, in high-dimensional, large-scale, zero-trust distributed edge networks, HFL is highly susceptible to attacks from malicious nodes across different layers. Most existing federated learning (FL) defense methods focus on two-layer structures and fail to address complex cross-layer attacks in HFL. Security research for HFL is nascent and lacks comprehensive defenses. We address these challenges by proposing HFLMND, a lightweight and robust defense framework designed to enhance the security and resilience of HFL by accurately identifying malicious nodes. In particular, we design a node similarity feature extraction method that extracts multiple features, such as cosine similarity and Euclidean distance, from model parameters submitted by clients and edge servers. Subsequently, we apply a hierarchical clustering strategy based on these features to detect malicious nodes. In addition, we integrate a historical suspicion score correction mechanism that enhances detection accuracy and stability by accumulating historical detection results. Evaluation results indicate that HFLMND effectively detects and defends against various attacks in HFL and achieves an average detection accuracy of > 90% across multiple attacks, with negligible impact on global model performance.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115270"},"PeriodicalIF":7.6,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-09DOI: 10.1016/j.knosys.2026.115308
Guxue Gao , Chunyun Sun , Xiaopeng Wen , Yang Xiao , Yuanyuan Wang
To address the issue that current supervised sand-dust image enhancement networks require large parameters and consume substantial computational resources and storage space, we propose a lightweight dual-view sand-dust image network. The proposed dual-view sharpening encoder and the original encoder are designed to provide complementary feature information, thereby maximizing the diversity of extracted features. At the encoder stage, a parameter-free feature modulation module is introduced and selectively embedded into the encoder branches to enhance feature extraction capability. In the decoding stage, a contextual attention integration module is designed to improve image contrast and enhance regional details by adaptively leveraging variance-based weighting and long-range pixel dependencies. These modules collectively strengthen feature representation and network reconstruction capacity while significantly reducing parameter overhead. Experimental results demonstrate that the proposed network can effectively enhance sand-dust images with fewer network parameters while ensuring performance. Additionally, the proposed algorithm generalizes well to haze and turbid underwater image enhancement. The processed images also improve the detection accuracy of targets such as vehicles and pedestrians, indicating its strong application potential.
{"title":"A lightweight dual-view network for sand-dust degraded image enhancement","authors":"Guxue Gao , Chunyun Sun , Xiaopeng Wen , Yang Xiao , Yuanyuan Wang","doi":"10.1016/j.knosys.2026.115308","DOIUrl":"10.1016/j.knosys.2026.115308","url":null,"abstract":"<div><div>To address the issue that current supervised sand-dust image enhancement networks require large parameters and consume substantial computational resources and storage space, we propose a lightweight dual-view sand-dust image network. The proposed dual-view sharpening encoder and the original encoder are designed to provide complementary feature information, thereby maximizing the diversity of extracted features. At the encoder stage, a parameter-free feature modulation module is introduced and selectively embedded into the encoder branches to enhance feature extraction capability. In the decoding stage, a contextual attention integration module is designed to improve image contrast and enhance regional details by adaptively leveraging variance-based weighting and long-range pixel dependencies. These modules collectively strengthen feature representation and network reconstruction capacity while significantly reducing parameter overhead. Experimental results demonstrate that the proposed network can effectively enhance sand-dust images with fewer network parameters while ensuring performance. Additionally, the proposed algorithm generalizes well to haze and turbid underwater image enhancement. The processed images also improve the detection accuracy of targets such as vehicles and pedestrians, indicating its strong application potential.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115308"},"PeriodicalIF":7.6,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-08DOI: 10.1016/j.knosys.2026.115271
Zhendong Xu, Hao Zhai, Zhi Zeng, Bo Lin, Minyu Deng
Multi-focus image fusion technology aims to combine images taken at different focal lengths into a globally clear all-in-focus image. However, traditional methods and existing deep learning methods still face challenges in balancing global semantic modeling with natural boundary preservation. To address this, this paper proposes a novel method that integrates granular-ball computing with graph convolutional neural networks, constructing a dual-branch hybrid architecture. In the graph convolutional neural network branch, we introduce granular-ball computing theory to represent the image as a series of adaptively generated semantic units (i.e., granular-ball), and employ an iterative optimization strategy guided by a deep clarity map to naturally align the granular-ball distribution with the focused regions in the image. Meanwhile, a clarity-aware graph convolutional network is designed to accurately identify focused areas by integrating multidimensional clarity features with a gating mechanism. In the convolutional neural network branch, a lightweight network is responsible for extracting rich local detail features. The two branches achieve deep collaboration through a multi-level feature interaction mechanism. Experimental results on four public datasets demonstrate that, compared to current mainstream methods, the proposed method shows significant advantages in both qualitative and quantitative evaluations.
{"title":"GBGCN: Adaptive granular-ball graph representation and clarity-aware GCN for multi-focus image fusion","authors":"Zhendong Xu, Hao Zhai, Zhi Zeng, Bo Lin, Minyu Deng","doi":"10.1016/j.knosys.2026.115271","DOIUrl":"10.1016/j.knosys.2026.115271","url":null,"abstract":"<div><div>Multi-focus image fusion technology aims to combine images taken at different focal lengths into a globally clear all-in-focus image. However, traditional methods and existing deep learning methods still face challenges in balancing global semantic modeling with natural boundary preservation. To address this, this paper proposes a novel method that integrates granular-ball computing with graph convolutional neural networks, constructing a dual-branch hybrid architecture. In the graph convolutional neural network branch, we introduce granular-ball computing theory to represent the image as a series of adaptively generated semantic units (i.e., granular-ball), and employ an iterative optimization strategy guided by a deep clarity map to naturally align the granular-ball distribution with the focused regions in the image. Meanwhile, a clarity-aware graph convolutional network is designed to accurately identify focused areas by integrating multidimensional clarity features with a gating mechanism. In the convolutional neural network branch, a lightweight network is responsible for extracting rich local detail features. The two branches achieve deep collaboration through a multi-level feature interaction mechanism. Experimental results on four public datasets demonstrate that, compared to current mainstream methods, the proposed method shows significant advantages in both qualitative and quantitative evaluations.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115271"},"PeriodicalIF":7.6,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-08DOI: 10.1016/j.knosys.2025.115232
Lan Huang , Yihang Geng , Chenghao Li , Rui Zhang
Classical graph representation learning methods are challenged by the non-neglectable variety between nodes and/or between edges in which context the real-world data has to be modeled as heterogeneous graphs. Current researches transform the heterogeneous graphs into homogeneous ones either through metapaths or by direct embeddings projected into the latent spaces. This paper proposes a method to transform the complex various type of data into multiple homogeneous graphs of the target nodes. It captures the semantic feature information of the different neighbor nodes via the multiple feature similarity matrices, and the structural feature information on the metapaths as a complement. Because the method exploits both the semantic and the structural features of the original heterogenous graph to represent the target nodes in the final homogenous graph, it outperforms most of the state of the art baseline methods on public datasets.
{"title":"Multiple feature similarities based heterogeneous graph representation","authors":"Lan Huang , Yihang Geng , Chenghao Li , Rui Zhang","doi":"10.1016/j.knosys.2025.115232","DOIUrl":"10.1016/j.knosys.2025.115232","url":null,"abstract":"<div><div>Classical graph representation learning methods are challenged by the non-neglectable variety between nodes and/or between edges in which context the real-world data has to be modeled as heterogeneous graphs. Current researches transform the heterogeneous graphs into homogeneous ones either through metapaths or by direct embeddings projected into the latent spaces. This paper proposes a method to transform the complex various type of data into multiple homogeneous graphs of the target nodes. It captures the semantic feature information of the different neighbor nodes via the multiple feature similarity matrices, and the structural feature information on the metapaths as a complement. Because the method exploits both the semantic and the structural features of the original heterogenous graph to represent the target nodes in the final homogenous graph, it outperforms most of the state of the art baseline methods on public datasets.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115232"},"PeriodicalIF":7.6,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-08DOI: 10.1016/j.knosys.2026.115282
Tesfaye Fenta Boka , Zhendong Niu
Cross-domain sequential recommendation (CDSR) systems aim to improve accuracy by leveraging knowledge across multiple domains. While existing approaches focus on user or item overlap independently, three crucial scenarios remain unexplored: user partial overlap with item partial overlap, user partial overlap with item full overlap, and user full overlap with item partial overlap. We introduce BiTrans-CDSR, a novel framework that enables bidirectional knowledge transfer through user and item bridges simultaneously. The framework employs large language models (LLMs) to generate pseudo cross-domain interactions. We propose a dual-bridge contrastive learning mechanism to align user behavioral patterns and item semantic relationships, and a bidirectional relevance-aware meta recall network to adaptively weight user-based and item-based signals for retrieving high-quality pseudo-items. Extensive experiments on three real-world datasets demonstrate that BiTrans-CDSR significantly outperforms state-of-the-art baselines across all three scenarios, with average improvements of 13.7% in NDCG@10 and 15.3% in HR@10. BiTrans-CDSR effectively bridges the gap between user-centric and item-centric knowledge transfer, providing a more comprehensive solution for complex cross-domain recommendation.
{"title":"BiTrans-CDSR: Bidirectional knowledge transfer for cross-domain sequential recommendation via joint user-item overlap modeling","authors":"Tesfaye Fenta Boka , Zhendong Niu","doi":"10.1016/j.knosys.2026.115282","DOIUrl":"10.1016/j.knosys.2026.115282","url":null,"abstract":"<div><div>Cross-domain sequential recommendation (CDSR) systems aim to improve accuracy by leveraging knowledge across multiple domains. While existing approaches focus on user or item overlap independently, three crucial scenarios remain unexplored: user partial overlap with item partial overlap, user partial overlap with item full overlap, and user full overlap with item partial overlap. We introduce BiTrans-CDSR, a novel framework that enables bidirectional knowledge transfer through user and item bridges simultaneously. The framework employs large language models (LLMs) to generate pseudo cross-domain interactions. We propose a dual-bridge contrastive learning mechanism to align user behavioral patterns and item semantic relationships, and a bidirectional relevance-aware meta recall network to adaptively weight user-based and item-based signals for retrieving high-quality pseudo-items. Extensive experiments on three real-world datasets demonstrate that BiTrans-CDSR significantly outperforms state-of-the-art baselines across all three scenarios, with average improvements of 13.7% in NDCG@10 and 15.3% in HR@10. BiTrans-CDSR effectively bridges the gap between user-centric and item-centric knowledge transfer, providing a more comprehensive solution for complex cross-domain recommendation.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115282"},"PeriodicalIF":7.6,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep neural networks (DNNs) are widely used for large-scale learning tasks because of their ability to model complex relationships within data. The Adaptive Moment Estimation (Adam) optimizer is a popular choice for training DNNs; however, its generalization performance can be suboptimal on challenging datasets. To address this limitation, we propose three modified Adam variants (Adam-V1, Adam-V2, and Adam-V3) that incorporate a projection-based gradient-correction mechanism inspired by quasi-Newton and conjugate gradient methods. This correction introduces curvature awareness without requiring full Hessian computations, improving convergence stability and reducing the tendency to settle at sharp or poorly generalizing minima. The proposed methods were systematically evaluated on both low- and high-dimensional tasks, including one- and two-variable non-convex functions, two-dimensional image segmentation, image classification using CNNs on MNIST, CIFAR- 10, and the more challenging CIFAR-100 datasets, as well as ResNet-based architectures on CIFAR-10. In addition, robustness on non-stationary real-world signals was assessed through ECG beat classification using the MIT-BIH Arrhythmia dataset. Experimental results demonstrate consistent improvements over baseline Adam. On CNN models trained on MNIST, Adam-V2 achieved the highest accuracy of 97.93 %, surpassing standard Adam (96.48 %) and highlighting the benefit of combining gradient correction with adaptive step-size adjustment in lower-dimensional settings. For CNNs trained on CIFAR-10, Adam-V3 attained a validation accuracy of 73.59 %, improving generalization relative to Adam (72.44 %). On the more complex CIFAR-100 dataset, the proposed variants consistently outperformed baseline Adam and recent adaptive optimizers in terms of accuracy and F1-score. Using a ResNet-50 model on CIFAR-10, Adam-V1 reached the highest accuracy of 79.9 %, while Adam-V3 achieved the best F1-score of 0.704, demonstrating strong performance in deeper network architectures. These results show that curvature-aware gradient corrections enhance convergence speed, stability, and generalization in deep learning tasks with minimal additional computational overhead. The proposed optimizers offer practical advantages for both shallow and deep architectures, providing a simple and effective improvement to existing adaptive optimization methods.
{"title":"Improving the Adam optimizer via projection-based gradient correction in deep learning","authors":"Alaa Luqman Ibrahim , Bayda Ghanim Fathi , Maiwan Bahjat Abdulrazzaq","doi":"10.1016/j.knosys.2026.115267","DOIUrl":"10.1016/j.knosys.2026.115267","url":null,"abstract":"<div><div>Deep neural networks (DNNs) are widely used for large-scale learning tasks because of their ability to model complex relationships within data. The Adaptive Moment Estimation (Adam) optimizer is a popular choice for training DNNs; however, its generalization performance can be suboptimal on challenging datasets. To address this limitation, we propose three modified Adam variants (Adam-V1, Adam-V2, and Adam-V3) that incorporate a projection-based gradient-correction mechanism inspired by quasi-Newton and conjugate gradient methods. This correction introduces curvature awareness without requiring full Hessian computations, improving convergence stability and reducing the tendency to settle at sharp or poorly generalizing minima. The proposed methods were systematically evaluated on both low- and high-dimensional tasks, including one- and two-variable non-convex functions, two-dimensional image segmentation, image classification using CNNs on MNIST, CIFAR- 10, and the more challenging CIFAR-100 datasets, as well as ResNet-based architectures on CIFAR-10. In addition, robustness on non-stationary real-world signals was assessed through ECG beat classification using the MIT-BIH Arrhythmia dataset. Experimental results demonstrate consistent improvements over baseline Adam. On CNN models trained on MNIST, Adam-V2 achieved the highest accuracy of 97.93 %, surpassing standard Adam (96.48 %) and highlighting the benefit of combining gradient correction with adaptive step-size adjustment in lower-dimensional settings. For CNNs trained on CIFAR-10, Adam-V3 attained a validation accuracy of 73.59 %, improving generalization relative to Adam (72.44 %). On the more complex CIFAR-100 dataset, the proposed variants consistently outperformed baseline Adam and recent adaptive optimizers in terms of accuracy and F1-score. Using a ResNet-50 model on CIFAR-10, Adam-V1 reached the highest accuracy of 79.9 %, while Adam-V3 achieved the best F1-score of 0.704, demonstrating strong performance in deeper network architectures. These results show that curvature-aware gradient corrections enhance convergence speed, stability, and generalization in deep learning tasks with minimal additional computational overhead. The proposed optimizers offer practical advantages for both shallow and deep architectures, providing a simple and effective improvement to existing adaptive optimization methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115267"},"PeriodicalIF":7.6,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Personalized dialogue generation requires chatbots to generate dialogue content that meets users’ persona and aligns with historical interactions. The long conversations pose difficulties for personalized and coherent responses, which becomes more challenging given that most current systems generate responses by directly encoding features derived from various persona. To make better use of the correlation between encoded features and actual responses, in this paper the Memory Distillation and Reproduction (MDR) framework is proposed. For utterance feature encoding, we utilize the student encoder to align with and fit the response features encoded by the teacher encoder through knowledge distillation, enhancing the understanding of underlying persona and complex contexts. For response generation, the decoding process is tailored to accommodate the contribution degree of response tokens. Therefore, MDR integrates users’ historical dialogue and personalized knowledge to construct up-to-date user profiles. Extensive experiments are conducted on ConvAI2 and Baidu PersonaChat datasets, compared with 8 strong existing methods through automatic evaluation. The results validate the superiority of MDR in terms of Coherence, Diversity and Consistency. Notably, MDR achieves BLEU-1 20.33 and Coh-Con.S 38.06 on ConvAI2, and ROUGE-L 30.05 and S-Dist-2 91.23 on Baidu PersonaChat.
{"title":"MDR: Memory distillation and reproduction for personalized dialogue generation","authors":"Pengli Wu , Xuebing Yang , Yanlong Wen , Wensheng Zhang","doi":"10.1016/j.knosys.2025.115252","DOIUrl":"10.1016/j.knosys.2025.115252","url":null,"abstract":"<div><div>Personalized dialogue generation requires chatbots to generate dialogue content that meets users’ persona and aligns with historical interactions. The long conversations pose difficulties for personalized and coherent responses, which becomes more challenging given that most current systems generate responses by directly encoding features derived from various persona. To make better use of the correlation between encoded features and actual responses, in this paper the Memory Distillation and Reproduction (MDR) framework is proposed. For utterance feature encoding, we utilize the student encoder to align with and fit the response features encoded by the teacher encoder through knowledge distillation, enhancing the understanding of underlying persona and complex contexts. For response generation, the decoding process is tailored to accommodate the contribution degree of response tokens. Therefore, MDR integrates users’ historical dialogue and personalized knowledge to construct up-to-date user profiles. Extensive experiments are conducted on ConvAI2 and Baidu PersonaChat datasets, compared with 8 strong existing methods through automatic evaluation. The results validate the superiority of MDR in terms of Coherence, Diversity and Consistency. Notably, MDR achieves BLEU-1 20.33 and Coh-Con.S 38.06 on ConvAI2, and ROUGE-L 30.05 and S-Dist-2 91.23 on Baidu PersonaChat.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115252"},"PeriodicalIF":7.6,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent advancements in Medical Multimodal Large Language Models (Med-MLLMs) have primarily concentrated on improving the large language model (LLM) backbone, constructing high-quality multimodal datasets, and extending model architectures. However, other key components, specifically the vision encoder and modality connector, remain underexplored. Most existing Med-MLLMs rely exclusively on high-level visual features from a single vision encoder, which can lead to the loss of fine-grained details and the introduction of visual bias. Furthermore, employing a single MLP as the modality connector enforces a static, single-path mapping between vision and language. This severely limits the model’s capacity to achieve robust modality alignment, particularly when handling complex and heterogeneous visual features. To address these limitations, we propose DM-Fuse, a novel feature fusion module. DM-Fuse integrates multi-level features from two complementary vision encoders (CLIP and DINOv2) via adaptive weighting and cross-attention, thereby substantially enhancing visual perception. In addition, we introduce MoE-Projector, a novel modality connector built upon a Mixture-of-Experts (MoE) architecture. It employs a dynamic routing mechanism to selectively activate the most relevant sub-projectors, enabling more adaptive and precise vision-language alignment. Building on these innovations, we develop Agamotto, an efficient Med-MLLM with only 4.6B parameters. Experimental results show that Agamotto substantially outperforms state-of-the-art methods across three medical Visual Question Answering (VQA) benchmarks. This underscores the necessity of jointly optimizing vision encoders and modality connectors to advance Med-MLLM performance. The code has been released on Github: https://github.com/NyKxo1/Agamotto.
{"title":"Enhancing medical MLLMs with dual vision encoders and MoE-based modality projector","authors":"Feizhong Zhou, Xingyue Liu, Zhiying Yang, Zhipeng Li, Hanguang Xiao","doi":"10.1016/j.knosys.2026.115275","DOIUrl":"10.1016/j.knosys.2026.115275","url":null,"abstract":"<div><div>Recent advancements in Medical Multimodal Large Language Models (Med-MLLMs) have primarily concentrated on improving the large language model (LLM) backbone, constructing high-quality multimodal datasets, and extending model architectures. However, other key components, specifically the vision encoder and modality connector, remain underexplored. Most existing Med-MLLMs rely exclusively on high-level visual features from a single vision encoder, which can lead to the loss of fine-grained details and the introduction of visual bias. Furthermore, employing a single MLP as the modality connector enforces a static, single-path mapping between vision and language. This severely limits the model’s capacity to achieve robust modality alignment, particularly when handling complex and heterogeneous visual features. To address these limitations, we propose DM-Fuse, a novel feature fusion module. DM-Fuse integrates multi-level features from two complementary vision encoders (CLIP and DINOv2) via adaptive weighting and cross-attention, thereby substantially enhancing visual perception. In addition, we introduce MoE-Projector, a novel modality connector built upon a Mixture-of-Experts (MoE) architecture. It employs a dynamic routing mechanism to selectively activate the most relevant sub-projectors, enabling more adaptive and precise vision-language alignment. Building on these innovations, we develop Agamotto, an efficient Med-MLLM with only 4.6B parameters. Experimental results show that Agamotto substantially outperforms state-of-the-art methods across three medical Visual Question Answering (VQA) benchmarks. This underscores the necessity of jointly optimizing vision encoders and modality connectors to advance Med-MLLM performance. The code has been released on Github: <span><span><u>https://github.com/NyKxo1/Agamotto</u></span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115275"},"PeriodicalIF":7.6,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1016/j.knosys.2026.115281
Jiarun Sun , Ling Dai , Ren Guan , Liang Duan
Session-based recommendation (SBR) that provides personalized predictions based on anonymous users’ short-term clicks has recently gained widespread attention. Nowadays, numerous SBR models overlook the joint extraction of explicit and implicit feedback, leading to biases in user behavior modeling. Meanwhile, most methods fail to fully leverage the latent information in repeated items and click orders within sessions, exacerbating the negative effects of data sparsity in SBR. To address the above issues, we propose the Dual Contrastive Learning with Behavior Pattern Modeling (DCL-BPM) method, which maximizes the use of short-term session information while extracting long-range user dependencies for recommendation. Specifically, we first employ GGNN and E-GNN to extract implicit and explicit feedback separately, effectively combining them to construct an accurate dynamic user profile. We then add the filtered session embeddings to prevent data loss caused by gradient mismatch. To better capture user preferences, we design a Dual Contrastive Loss (DCL) framework that constructs negative samples through deduplication and random reshuffling, highlighting the critical role of item frequency and click orders in positive samples during training. DCL is not limited by the network architecture, making it easily adaptable to diverse scenarios in SBR. Extensive experiments on three representative datasets demonstrate the effectiveness of our model and its practical value in real-world applications.
{"title":"Dual contrastive learning with behavior pattern modeling for session-based recommendation","authors":"Jiarun Sun , Ling Dai , Ren Guan , Liang Duan","doi":"10.1016/j.knosys.2026.115281","DOIUrl":"10.1016/j.knosys.2026.115281","url":null,"abstract":"<div><div>Session-based recommendation (SBR) that provides personalized predictions based on anonymous users’ short-term clicks has recently gained widespread attention. Nowadays, numerous SBR models overlook the joint extraction of explicit and implicit feedback, leading to biases in user behavior modeling. Meanwhile, most methods fail to fully leverage the latent information in repeated items and click orders within sessions, exacerbating the negative effects of data sparsity in SBR. To address the above issues, we propose the Dual Contrastive Learning with Behavior Pattern Modeling (<strong>DCL-BPM</strong>) method, which maximizes the use of short-term session information while extracting long-range user dependencies for recommendation. Specifically, we first employ GGNN and E-GNN to extract implicit and explicit feedback separately, effectively combining them to construct an accurate dynamic user profile. We then add the filtered session embeddings to prevent data loss caused by gradient mismatch. To better capture user preferences, we design a Dual Contrastive Loss (<strong>DCL</strong>) framework that constructs negative samples through deduplication and random reshuffling, highlighting the critical role of item frequency and click orders in positive samples during training. DCL is not limited by the network architecture, making it easily adaptable to diverse scenarios in SBR. Extensive experiments on three representative datasets demonstrate the effectiveness of our model and its practical value in real-world applications.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115281"},"PeriodicalIF":7.6,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}