Knowledge graph completion (KGC) aims to infer missing facts by learning latent semantic patterns from observed triples. While existing methods learn superficial semantic co-occurrence through classical probabilistic frameworks, they struggle to capture non-classical semantic properties such as entanglements that govern intrinsic correlations between semantics. These entanglements, critical for disambiguating contextual semantics, cannot be represented in classical probabilistic spaces which lack mathematical tools to represent quantum-like properties. We address this gap with QIKGC, a quantum-informed KGC framework that (i) embeds entity semantics into Hilbert space to explicitly model entanglement, (ii) leverages matrix product states to approximate high-dimensional semantic structures with polynomial complexity, and (iii) treats relations as quantum measurements followed by tomography-based scoring to obtain context-specific entity representations. This is, to our knowledge, the first KGC model that unifies semantic entanglement modeling with trainable quantum operators while remaining efficient on classical hardware. Extensive experiments on four benchmarks demonstrate clear quantitative gains, for example increasing MRR from 0.511 to 0.537 on WN18RR and from 0.904 to 0.926 on Kinship over the best baselines.
{"title":"From co-occurrence to coherence: Quantum-informed representation learning for knowledge graph completion","authors":"Mankun Zhao , Bingtao Xu , Jiujiang Guo , Jian Yu , Tianyi Xu , Mei Yu","doi":"10.1016/j.knosys.2026.115408","DOIUrl":"10.1016/j.knosys.2026.115408","url":null,"abstract":"<div><div>Knowledge graph completion (KGC) aims to infer missing facts by learning latent semantic patterns from observed triples. While existing methods learn superficial semantic co-occurrence through classical probabilistic frameworks, they struggle to capture non-classical semantic properties such as entanglements that govern intrinsic correlations between semantics. These entanglements, critical for disambiguating contextual semantics, cannot be represented in classical probabilistic spaces which lack mathematical tools to represent quantum-like properties. We address this gap with QIKGC, a quantum-informed KGC framework that (i) embeds entity semantics into Hilbert space to explicitly model entanglement, (ii) leverages matrix product states to approximate high-dimensional semantic structures with polynomial complexity, and (iii) treats relations as quantum measurements followed by tomography-based scoring to obtain context-specific entity representations. This is, to our knowledge, the first KGC model that unifies semantic entanglement modeling with trainable quantum operators while remaining efficient on classical hardware. Extensive experiments on four benchmarks demonstrate clear quantitative gains, for example increasing MRR from 0.511 to 0.537 on WN18RR and from 0.904 to 0.926 on Kinship over the best baselines.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115408"},"PeriodicalIF":7.6,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-24DOI: 10.1016/j.knosys.2026.115356
Rulong Liu , Qing He , Yuji Wang , Nisuo Du , Zhihao Yang
Multimodal Summarization (MS) generates high-quality summaries by integrating textual and visual information. However, existing MS research faces several challenges, including (1) ignoring fine-grained key information between visual and textual modalities and interaction with coarse-grained information, (2) cross-modal semantic inconsistency, which hinders alignment and fusion of visual and textual feature spaces, and (3) ignoring inherent heterogeneity of an image when filtering visual information, which causes excessive filtering or excessive retention. To address these issues, we propose Coarse-and-Fine Granularity Synergy and Region Counterfactual Reasoning Filter (CFCR) for MS. Specifically, we design Coarse-and-Fine Granularity Synergy (CFS) to capture both global (coarse-grained) and important detailed (fine-grained) information in text and image modalities. Based on this, we design Dual-granularity Contrastive Learning (DCL) for mapping coarse-grained and fine-grained visual features into the text semantic space, thereby reducing semantic inconsistency caused by modality differences at dual granularity levels, and facilitating cross-modal alignment. To address the issue of excessive filtering or excessive retention in visual information filtering, we design a Region Counterfactual Reasoning Filter (RCF) that employs Counterfactual Reasoning to determine the validity of image regions and generate category labels. These labels are then used to train Image Region Selector to select regions beneficial for summarization. Extensive experiments on the representative MMSS and MSMO dataset show that CFCR outperforms multiple strong baselines, particularly in terms of selecting and focusing on critical details, demonstrating its effectiveness in MS.
{"title":"Multimodal summarization via coarse-and-fine granularity synergy and region counterfactual reasoning filter","authors":"Rulong Liu , Qing He , Yuji Wang , Nisuo Du , Zhihao Yang","doi":"10.1016/j.knosys.2026.115356","DOIUrl":"10.1016/j.knosys.2026.115356","url":null,"abstract":"<div><div>Multimodal Summarization (MS) generates high-quality summaries by integrating textual and visual information. However, existing MS research faces several challenges, including (1) ignoring fine-grained key information between visual and textual modalities and interaction with coarse-grained information, (2) cross-modal semantic inconsistency, which hinders alignment and fusion of visual and textual feature spaces, and (3) ignoring inherent heterogeneity of an image when filtering visual information, which causes excessive filtering or excessive retention. To address these issues, we propose Coarse-and-Fine Granularity Synergy and Region Counterfactual Reasoning Filter (CFCR) for MS. Specifically, we design Coarse-and-Fine Granularity Synergy (CFS) to capture both global (coarse-grained) and important detailed (fine-grained) information in text and image modalities. Based on this, we design Dual-granularity Contrastive Learning (DCL) for mapping coarse-grained and fine-grained visual features into the text semantic space, thereby reducing semantic inconsistency caused by modality differences at dual granularity levels, and facilitating cross-modal alignment. To address the issue of excessive filtering or excessive retention in visual information filtering, we design a Region Counterfactual Reasoning Filter (RCF) that employs Counterfactual Reasoning to determine the validity of image regions and generate category labels. These labels are then used to train Image Region Selector to select regions beneficial for summarization. Extensive experiments on the representative MMSS and MSMO dataset show that CFCR outperforms multiple strong baselines, particularly in terms of selecting and focusing on critical details, demonstrating its effectiveness in MS.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115356"},"PeriodicalIF":7.6,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-24DOI: 10.1016/j.knosys.2026.115360
Weiyi Yang , Yuqi Li , Mingsheng Shang , Shuai Li , Shiping Wen
{"title":"Corrigendum to “A Novel Data-Driven Input Shaping Method Using Residual Impulse Vector Via Unscented Kalman Filter” [Knowledge-Based Systems (2025), Volume 329, Part B, November 2025, 114385]","authors":"Weiyi Yang , Yuqi Li , Mingsheng Shang , Shuai Li , Shiping Wen","doi":"10.1016/j.knosys.2026.115360","DOIUrl":"10.1016/j.knosys.2026.115360","url":null,"abstract":"","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115360"},"PeriodicalIF":7.6,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-24DOI: 10.1016/j.knosys.2026.115406
Zhiyu Xu , Kai Lin , Pengpeng Qiu , Tong Shen , Fu Zhang
Knowledge Graph Embedding (KGE) has been widely used to address the incompleteness of Knowledge Graph (KG) by predicting missing facts. Temporal Knowledge Graph Embedding (TKGE) extends KGE by incorporating temporal information into fact representations. However, most existing research focuses on static graphs and ignores the temporal dynamics of facts in TKG, which poses significant challenges for link prediction. Furthermore, current TKGE models still struggle with effectively capturing and representing crucial relation patterns, including symmetry, antisymmetry, inversion, composition, and temporal, along with complex relation mapping properties like 1-to-N, N-to-1, and N-to-N. To overcome these challenges, we propose a Temporal Householder Transformation Embedding model called TeHTE, which fuses temporal information with Householder transformation to capture both static and temporal features within TKG effectively. In the static module, TeHTE constructs static entity embeddings by reflecting the head entity through a transfer matrix and represents each relation with a pair of vectors to capture relational semantics. In the temporal module, TeHTE integrates temporal information into the entity representation through the time transfer matrix and shared time window, thereby enhancing its ability to capture temporal features. To further enhance modeling capacity, TeHTE learns a set of Householder transformations parameterized by relations to obtain structural embeddings for entities. Moreover, we theoretically demonstrate the ability of TeHTE to model various relation patterns and mapping properties. Experimental results on four benchmark datasets indicate that TeHTE substantially surpasses most existing TKGE approaches on temporal link prediction tasks. Ablation studies further validate the contribution of each component within the TeHTE framework.
{"title":"Temporal householder transformation embedding for temporal knowledge graph completion","authors":"Zhiyu Xu , Kai Lin , Pengpeng Qiu , Tong Shen , Fu Zhang","doi":"10.1016/j.knosys.2026.115406","DOIUrl":"10.1016/j.knosys.2026.115406","url":null,"abstract":"<div><div>Knowledge Graph Embedding (KGE) has been widely used to address the incompleteness of Knowledge Graph (KG) by predicting missing facts. Temporal Knowledge Graph Embedding (TKGE) extends KGE by incorporating temporal information into fact representations. However, most existing research focuses on static graphs and ignores the temporal dynamics of facts in TKG, which poses significant challenges for link prediction. Furthermore, current TKGE models still struggle with effectively capturing and representing crucial relation patterns, including <em>symmetry, antisymmetry, inversion, composition</em>, and <em>temporal</em>, along with complex relation mapping properties like 1<em>-to-N, N-to-</em>1, and <em>N-to-N</em>. To overcome these challenges, we propose a <strong>Te</strong>mporal <strong>H</strong>ouseholder <strong>T</strong>ransformation <strong>E</strong>mbedding model called TeHTE, which fuses temporal information with Householder transformation to capture both static and temporal features within TKG effectively. In the static module, TeHTE constructs static entity embeddings by reflecting the head entity through a transfer matrix and represents each relation with a pair of vectors to capture relational semantics. In the temporal module, TeHTE integrates temporal information into the entity representation through the time transfer matrix and shared time window, thereby enhancing its ability to capture temporal features. To further enhance modeling capacity, TeHTE learns a set of Householder transformations parameterized by relations to obtain structural embeddings for entities. Moreover, we theoretically demonstrate the ability of TeHTE to model various relation patterns and mapping properties. Experimental results on four benchmark datasets indicate that TeHTE substantially surpasses most existing TKGE approaches on temporal link prediction tasks. Ablation studies further validate the contribution of each component within the TeHTE framework.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115406"},"PeriodicalIF":7.6,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.knosys.2026.115364
Hua Duan, Junyue Dong, Yufei Zhao, Shiduo Wang, Wenhao Wang
Prediction of Drug-Target Interactions(DTI) is crucial for drug discovery. Heterogeneous graph neural networks(HGNNs) provide an efficient computational approach by modeling complex biological networks, overcoming the high cost and time constraints associated with traditional experimental methods. However, existing HGNNs primarily rely on meta-path-based topological learning, often overlooking attribute similarities between nodes and inherent structural consistency. This single-perspective learning mechanism limits their ability to leverage multi-source heterogeneous information, resulting in poor generalization performance, particularly under sparse data scenarios. To address these issues, this paper proposes MV-HGNN, a multi-view fusion model. It learns comprehensive feature embeddings for drugs and proteins from three complementary perspectives: 1) A View-Specific Topology Embedding Module, which captures topology-driven representations through graph propagation and aggregation; 2) A Structure-Consensus-Aware Cross-Domain Alignment Module, which identifies latent structural consistency by mining original node features, thereby compensating for missing topological information in sparse networks; 3) A Latent Space Semantic Regularization Aggregation Module, which enhances generalization with scarce samples by pulling semantically similar nodes closer in the refined latent embedding space. The complementary features learned from these topological, structural, and semantic views are fused via an adaptive attention mechanism. The DTI prediction task is formulated as a classification problem on a constructed Drug-Protein Pair(DPP) graph. Experimental results demonstrate that MV-HGNN significantly outperforms existing baseline methods across multiple metrics.
{"title":"Multi-View fusion feature representation learning for drug-target interaction prediction","authors":"Hua Duan, Junyue Dong, Yufei Zhao, Shiduo Wang, Wenhao Wang","doi":"10.1016/j.knosys.2026.115364","DOIUrl":"10.1016/j.knosys.2026.115364","url":null,"abstract":"<div><div>Prediction of Drug-Target Interactions(DTI) is crucial for drug discovery. Heterogeneous graph neural networks(HGNNs) provide an efficient computational approach by modeling complex biological networks, overcoming the high cost and time constraints associated with traditional experimental methods. However, existing HGNNs primarily rely on meta-path-based topological learning, often overlooking attribute similarities between nodes and inherent structural consistency. This single-perspective learning mechanism limits their ability to leverage multi-source heterogeneous information, resulting in poor generalization performance, particularly under sparse data scenarios. To address these issues, this paper proposes MV-HGNN, a multi-view fusion model. It learns comprehensive feature embeddings for drugs and proteins from three complementary perspectives: 1) A View-Specific Topology Embedding Module, which captures topology-driven representations through graph propagation and aggregation; 2) A Structure-Consensus-Aware Cross-Domain Alignment Module, which identifies latent structural consistency by mining original node features, thereby compensating for missing topological information in sparse networks; 3) A Latent Space Semantic Regularization Aggregation Module, which enhances generalization with scarce samples by pulling semantically similar nodes closer in the refined latent embedding space. The complementary features learned from these topological, structural, and semantic views are fused via an adaptive attention mechanism. The DTI prediction task is formulated as a classification problem on a constructed Drug-Protein Pair(DPP) graph. Experimental results demonstrate that MV-HGNN significantly outperforms existing baseline methods across multiple metrics.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115364"},"PeriodicalIF":7.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.knosys.2026.115380
Licheng Yang , Yu Yao , Daoqing Yang , Wei Yang , Yuming Hao
In practical applications, the performance of industrial data stream anomaly detection methods often degrades due to concept drift. The core bottleneck lies in the fact that existing algorithms struggle to dynamically perceive the coupling relationship between data distribution changes and anomaly patterns. This paper proposes a generalized framework for time series anomaly detection based on Dynamic Drift Awareness and Diffusion Enhancement (DDADE). Through real-time distance monitoring and an adaptive model incremental learning mechanism, it achieves collaborative detection of concept drift and anomaly events. Specifically, the innovation of this work is as follows: First, a drift detection module based on the industrial-enhanced Mahalanobis distance is designed to capture the covariate shift in the feature space in real-time. Second, an anomaly detection model based on diffusion enhancement is proposed, which can perform incremental learning or dynamically adjust the threshold according to the drift detection results. Experiments show that in several representative industrial simulation datasets containing drift scenarios, this method outperforms the baseline models.
{"title":"A generalizable anomaly detection framework with dynamic concept drift suppression for non-stationary time series","authors":"Licheng Yang , Yu Yao , Daoqing Yang , Wei Yang , Yuming Hao","doi":"10.1016/j.knosys.2026.115380","DOIUrl":"10.1016/j.knosys.2026.115380","url":null,"abstract":"<div><div>In practical applications, the performance of industrial data stream anomaly detection methods often degrades due to concept drift. The core bottleneck lies in the fact that existing algorithms struggle to dynamically perceive the coupling relationship between data distribution changes and anomaly patterns. This paper proposes a generalized framework for time series anomaly detection based on <u>D</u>ynamic <u>D</u>rift <u>A</u>wareness and <u>D</u>iffusion <u>E</u>nhancement (DDADE). Through real-time distance monitoring and an adaptive model incremental learning mechanism, it achieves collaborative detection of concept drift and anomaly events. Specifically, the innovation of this work is as follows: First, a drift detection module based on the industrial-enhanced Mahalanobis distance is designed to capture the covariate shift in the feature space in real-time. Second, an anomaly detection model based on diffusion enhancement is proposed, which can perform incremental learning or dynamically adjust the threshold according to the drift detection results. Experiments show that in several representative industrial simulation datasets containing drift scenarios, this method outperforms the baseline models.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115380"},"PeriodicalIF":7.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.knosys.2026.115391
Jian Fang , Siyi Qian , Shaohui Liu
Worldwide image geolocalization aims to accurately predict the geographic location where a given image was captured. Due to the vast scale of the Earth and the uneven distribution of geographic features, this task remains highly challenging. Traditional methods exhibit clear limitations when handling global-scale data. To address these challenges, we propose GEOMR, an effective and adaptive framework that integrates image geographic features and human reasoning knowledge to enhance global geolocalization accuracy. GEOMR consists of two modules. The first module extracts geographic features from images by jointly learning multimodal features. The second module involves training a multimodal large language model in a two-phase process to enhance its geolocalization reasoning capabilities. The first phase learns human geolocalization reasoning knowledge, enabling the model to utilize geographic cues present in images effectively. The second phase focuses on learning how to use reference information to infer the correct geographic coordinates. Extensive experiments conducted on the IM2GPS3K, YFCC4K, and YFCC26K datasets demonstrate that GEOMR significantly outperforms state-of-the-art methods.
{"title":"GEOMR: Integrating image geographic features and human reasoning knowledge for image geolocalization","authors":"Jian Fang , Siyi Qian , Shaohui Liu","doi":"10.1016/j.knosys.2026.115391","DOIUrl":"10.1016/j.knosys.2026.115391","url":null,"abstract":"<div><div>Worldwide image geolocalization aims to accurately predict the geographic location where a given image was captured. Due to the vast scale of the Earth and the uneven distribution of geographic features, this task remains highly challenging. Traditional methods exhibit clear limitations when handling global-scale data. To address these challenges, we propose GEOMR, an effective and adaptive framework that integrates image geographic features and human reasoning knowledge to enhance global geolocalization accuracy. GEOMR consists of two modules. The first module extracts geographic features from images by jointly learning multimodal features. The second module involves training a multimodal large language model in a two-phase process to enhance its geolocalization reasoning capabilities. The first phase learns human geolocalization reasoning knowledge, enabling the model to utilize geographic cues present in images effectively. The second phase focuses on learning how to use reference information to infer the correct geographic coordinates. Extensive experiments conducted on the IM2GPS3K, YFCC4K, and YFCC26K datasets demonstrate that GEOMR significantly outperforms state-of-the-art methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115391"},"PeriodicalIF":7.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.knosys.2026.115349
Wei Li , Linye Ma , Wenyi Zhao , Huihua Yang
Semi-supervised medical image segmentation (SSMIS) aims to alleviate the burden of extensive pixel/voxel-wise annotations by effectively leveraging unlabeled data. While prevalent approaches relying on pseudo-labeling or consistency regularization have shown promise, they are often prone to confirmation bias due to limited feature diversity. Furthermore, existing mixed sampling strategies utilized to expand the training scale frequently generate synthetic data that deviates from real-world distributions, potentially misleading the learning process. To address these challenges, we introduce a novel framework called Mutual Masked Image Consistency and Feature Adversarial Training (MCFAT-Net). Our approach enhances model diversity through a multi-perspective strategy, fostering global-local consistency to improve generalization. Specifically, MCFAT-Net comprises a shared encoder and dual classifiers that leverage Mutual Feature Adversarial Training to inject perturbations, ensuring sub-network divergence and decision boundary smoothness. Moreover, we integrate a dual-level data augmentation strategy: Cross-Set CutMix operating at the inter-sample level to capture global dataset structures, and Mutual Masked Image Consistency operating at the intra-sample level to refine fine-grained local representations. This combination enables the simultaneous capture of pairwise structures across the entire dataset and individual part-object relationships. Extensive experiments on three public datasets demonstrate that MCFAT-Net achieves superior performance compared to state-of-the-art methods.
{"title":"Mutual masked image consistency and feature adversarial training for semi-supervised medical image segmentation","authors":"Wei Li , Linye Ma , Wenyi Zhao , Huihua Yang","doi":"10.1016/j.knosys.2026.115349","DOIUrl":"10.1016/j.knosys.2026.115349","url":null,"abstract":"<div><div>Semi-supervised medical image segmentation (SSMIS) aims to alleviate the burden of extensive pixel/voxel-wise annotations by effectively leveraging unlabeled data. While prevalent approaches relying on pseudo-labeling or consistency regularization have shown promise, they are often prone to confirmation bias due to limited feature diversity. Furthermore, existing mixed sampling strategies utilized to expand the training scale frequently generate synthetic data that deviates from real-world distributions, potentially misleading the learning process. To address these challenges, we introduce a novel framework called Mutual Masked Image Consistency and Feature Adversarial Training (MCFAT-Net). Our approach enhances model diversity through a multi-perspective strategy, fostering global-local consistency to improve generalization. Specifically, MCFAT-Net comprises a shared encoder and dual classifiers that leverage Mutual Feature Adversarial Training to inject perturbations, ensuring sub-network divergence and decision boundary smoothness. Moreover, we integrate a dual-level data augmentation strategy: Cross-Set CutMix operating at the inter-sample level to capture global dataset structures, and Mutual Masked Image Consistency operating at the intra-sample level to refine fine-grained local representations. This combination enables the simultaneous capture of pairwise structures across the entire dataset and individual part-object relationships. Extensive experiments on three public datasets demonstrate that MCFAT-Net achieves superior performance compared to state-of-the-art methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115349"},"PeriodicalIF":7.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.knosys.2026.115409
Jian Wang , Xingcheng Fu , Qingyun Sun , Li-E Wang , Hao Peng , Jiting Li , Xianxian Li , Minglai Shao
The performance of graph neural networks is limited on heterophilic graphs since heterophilic connections hinder the transport of supervision signals related to downstream tasks. In recent years, most existing works based on node-pair heterophily “transform” heterophilic graphs into special homophilic graphs, which often increase homophilic connectivity and remove heterophilic edges, thereby converting highly heterophilic graphs into highly homophilic ones. They only consider the label difference between node pairs while overlooking the change in the label distribution between their neighborhoods. They need to provide some heuristic priors or complex designs to alleviate the lack of underlying understanding of the heterophilic information propagation, which leads to the issue of heterophily inconsistency. To address the issue of heterophily inconsistency, based on optimal transport theory, we extend the definition of curvature and propose the Heterophily Curvature Graph Representation Learning framework (HetCurv) to optimize the information transport structure and learn better node representations simultaneously. HetCurv perceives the variation of supervision signals on heterophilic graphs through heterophily curvature, and learns the optimal information transport pattern for specific downstream tasks. Extensive experiments demonstrate the superiority of the proposed method in comparison to state-of-the-art baselines across various node classification benchmarks.
{"title":"Rethinking heterophilic graph learning via graph curvature","authors":"Jian Wang , Xingcheng Fu , Qingyun Sun , Li-E Wang , Hao Peng , Jiting Li , Xianxian Li , Minglai Shao","doi":"10.1016/j.knosys.2026.115409","DOIUrl":"10.1016/j.knosys.2026.115409","url":null,"abstract":"<div><div>The performance of graph neural networks is limited on heterophilic graphs since heterophilic connections hinder the transport of supervision signals related to downstream tasks. In recent years, most existing works based on node-pair heterophily “transform” heterophilic graphs into special homophilic graphs, which often increase homophilic connectivity and remove heterophilic edges, thereby converting highly heterophilic graphs into highly homophilic ones. They only consider the label difference between node pairs while overlooking the change in the label distribution between their neighborhoods. They need to provide some heuristic priors or complex designs to alleviate the lack of underlying understanding of the heterophilic information propagation, which leads to the issue of heterophily inconsistency. To address the issue of heterophily inconsistency, based on optimal transport theory, we extend the definition of curvature and propose the Heterophily Curvature Graph Representation Learning framework (<strong>HetCurv</strong>) to optimize the information transport structure and learn better node representations simultaneously. HetCurv perceives the variation of supervision signals on heterophilic graphs through heterophily curvature, and learns the optimal information transport pattern for specific downstream tasks. Extensive experiments demonstrate the superiority of the proposed method in comparison to state-of-the-art baselines across various node classification benchmarks.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115409"},"PeriodicalIF":7.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.knosys.2026.115404
Shouxing Ma , Shiqing Wu , Yawen Zeng , Kaize Shi , Guandong Xu
Multi-modal recommender systems, incorporating rich content information (e.g., images and texts) into user behavior modeling, have attracted significant attention recently. Current work has successfully combined graph neural networks (GNNs) and contrastive learning to improve recommendation accuracy and mitigate the inherent sparse data problem. Yet, view augmentation strategies borrowed from other domains-such as edge or node dropout-tend to distort the original graph structure, leading to unintended semantic drift and suboptimal representation learning. Moreover, prior work has predominantly focused on optimizing inter-modal weights while overlooking user-specific modality preferences and adaptation of modal features generated by generic models. To tackle the above issues, we propose a novel multi-mOdal dUal aTtention Graph cOntrastive learning framework (OUTGO). Specifically, we first encode user and item representations by utilizing user and item homogeneous GNNs. Then, we employ designed intra- and inter-attention mechanisms, sequentially and adaptively, tuning each modal feature value based on the principal loss and considering fusing them with different modal perspectives. Additionally, semantic and structural contrastive learning tasks are introduced to alleviate the sparse data without destroying the original data structure. Extensive experiments on real-world datasets demonstrate the superiority of OUTGO compared to state-of-the-art baselines. The code is available at https://github.com/MrShouxingMa/OUTGO.
{"title":"Multi-modal dual attention graph contrastive learning for recommendation","authors":"Shouxing Ma , Shiqing Wu , Yawen Zeng , Kaize Shi , Guandong Xu","doi":"10.1016/j.knosys.2026.115404","DOIUrl":"10.1016/j.knosys.2026.115404","url":null,"abstract":"<div><div>Multi-modal recommender systems, incorporating rich content information (e.g., images and texts) into user behavior modeling, have attracted significant attention recently. Current work has successfully combined graph neural networks (GNNs) and contrastive learning to improve recommendation accuracy and mitigate the inherent sparse data problem. Yet, view augmentation strategies borrowed from other domains-such as edge or node dropout-tend to distort the original graph structure, leading to unintended semantic drift and suboptimal representation learning. Moreover, prior work has predominantly focused on optimizing inter-modal weights while overlooking user-specific modality preferences and adaptation of modal features generated by generic models. To tackle the above issues, we propose a novel multi-m<strong><u>O</u></strong>dal d<strong><u>U</u></strong>al a<strong><u>T</u></strong>tention <strong><u>G</u></strong>raph c<strong><u>O</u></strong>ntrastive learning framework (OUTGO). Specifically, we first encode user and item representations by utilizing user and item homogeneous GNNs. Then, we employ designed intra- and inter-attention mechanisms, sequentially and adaptively, tuning each modal feature value based on the principal loss and considering fusing them with different modal perspectives. Additionally, semantic and structural contrastive learning tasks are introduced to alleviate the sparse data without destroying the original data structure. Extensive experiments on real-world datasets demonstrate the superiority of OUTGO compared to state-of-the-art baselines. The code is available at <span><span>https://github.com/MrShouxingMa/OUTGO</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115404"},"PeriodicalIF":7.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}