Resting-state functional magnetic resonance imaging (rs-fMRI) provides critical biomarkers for diagnosing neuropsychiatric disorders such as autism spectrum disorder (ASD) and major depressive disorder (MDD). However, existing deep learning models heavily rely on labeled data, limiting their clinical applicability. This study proposes a GIN-Transformer-based pairwise graph contrastive learning framework (GITrans-PairCL) that integrates a Graph Isomorphism Network (GIN) and Transformer to address data scarcity through unsupervised graph contrastive learning. The framework comprises two key components: a Dual-modal Contrastive Learning (DCL) module and a Task-Driven Fine-tuning (TDF) module. DCL employs sliding-window augmented rs-fMRI time series, combining GIN for modeling local spatial connectivity and Transformer for capturing global temporal dynamics, enabling multi-scale feature extraction via cross-view contrastive learning. TDF adapts the pre-trained model to downstream classification tasks. We conducted single-site and cross-site evaluation on two publicly available datasets, and the experimental results showed that GITrans-PairCL outperforms both traditional machine learning and deep learning baseline methods in automatic diagnosis of brain diseases. The model combines local and global features, and uses pre-trained contrast learning to reduce the dependence on labeling information and improve generalization.
{"title":"GIN-transformer based pairwise graph contrastive learning framework","authors":"Shufeng Zhou , Lina Zhou , Yueying Zhou , Hongyan Han , Hongxia Zheng , Lishan Qiao","doi":"10.1016/j.neunet.2026.108621","DOIUrl":"10.1016/j.neunet.2026.108621","url":null,"abstract":"<div><div>Resting-state functional magnetic resonance imaging (rs-fMRI) provides critical biomarkers for diagnosing neuropsychiatric disorders such as autism spectrum disorder (ASD) and major depressive disorder (MDD). However, existing deep learning models heavily rely on labeled data, limiting their clinical applicability. This study proposes a GIN-Transformer-based pairwise graph contrastive learning framework (GITrans-PairCL) that integrates a Graph Isomorphism Network (GIN) and Transformer to address data scarcity through unsupervised graph contrastive learning. The framework comprises two key components: a Dual-modal Contrastive Learning (DCL) module and a Task-Driven Fine-tuning (TDF) module. DCL employs sliding-window augmented rs-fMRI time series, combining GIN for modeling local spatial connectivity and Transformer for capturing global temporal dynamics, enabling multi-scale feature extraction via cross-view contrastive learning. TDF adapts the pre-trained model to downstream classification tasks. We conducted single-site and cross-site evaluation on two publicly available datasets, and the experimental results showed that GITrans-PairCL outperforms both traditional machine learning and deep learning baseline methods in automatic diagnosis of brain diseases. The model combines local and global features, and uses pre-trained contrast learning to reduce the dependence on labeling information and improve generalization.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108621"},"PeriodicalIF":6.3,"publicationDate":"2026-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-17DOI: 10.1016/j.neunet.2026.108598
Sihang Zhang , Congqi Cao , Qiang Gao , Ganchao Liu
End-to-end visual odometry models have recently achieved localization accuracy on par with conventional techniques, while effectively reducing the occurrence of catastrophic failures. However, the relevant models cannot leverage the complete time-series data for pose adjustment and optimization. Moreover, these models are limited to using joint depth prediction tasks merely as a means of scale constraint, lacking effective utilization of depth information. In this paper, we propose an end-to-end multi-source visual odometry (MVO) model that dynamically integrates the key components of hybrid visual odometry pipelines into a unified, learnable deep framework. Specifically, we propose TimePoseNet to model the mapping relationship from time to pose, capturing temporal dependencies across the entire sequence. Additionally, a wavelet convolutional attention mechanism is employed to extract global depth information from the depth map, which is then directly embedded into the pose features to dynamically constrain scale ambiguity. Furthermore, temporal and depth cues are jointly incorporated into the post-processing stage of pose estimation. The proposed method attains state-of-the-art performance on both the KITTI benchmark and the newly introduced UAV-2025 dataset, while preserving computational efficiency during inference.
{"title":"Multi-Source Temporal-Depth fusion for robust end-to-End visual odometry","authors":"Sihang Zhang , Congqi Cao , Qiang Gao , Ganchao Liu","doi":"10.1016/j.neunet.2026.108598","DOIUrl":"10.1016/j.neunet.2026.108598","url":null,"abstract":"<div><div>End-to-end visual odometry models have recently achieved localization accuracy on par with conventional techniques, while effectively reducing the occurrence of catastrophic failures. However, the relevant models cannot leverage the complete time-series data for pose adjustment and optimization. Moreover, these models are limited to using joint depth prediction tasks merely as a means of scale constraint, lacking effective utilization of depth information. In this paper, we propose an end-to-end multi-source visual odometry (MVO) model that dynamically integrates the key components of hybrid visual odometry pipelines into a unified, learnable deep framework. Specifically, we propose TimePoseNet to model the mapping relationship from time to pose, capturing temporal dependencies across the entire sequence. Additionally, a wavelet convolutional attention mechanism is employed to extract global depth information from the depth map, which is then directly embedded into the pose features to dynamically constrain scale ambiguity. Furthermore, temporal and depth cues are jointly incorporated into the post-processing stage of pose estimation. The proposed method attains state-of-the-art performance on both the KITTI benchmark and the newly introduced UAV-2025 dataset, while preserving computational efficiency during inference.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108598"},"PeriodicalIF":6.3,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-17DOI: 10.1016/j.neunet.2026.108619
Chao Zeng , Jiaqi Zhao , Miao Zhang , Li Wang , Weili Guan , Liqiang Nie
Post-Training Quantization (PTQ) has emerged as an effective approach to reduce memory and computational demands during LLMs inference. However, existing PTQ methods are highly sensitive to ultra-low-bit quantization with significant performance loss, which is further exacerbated by recently released advanced models like LLaMA-3 and LLaMA-3.1. To address this challenge, we propose a novel PTQ framework, termed FRM-PTQ, by introducing feature relationship matching. This approach integrates token-level relationship modeling and structure-level distribution alignment based on the intra-block self-distillation framework to effectively mitigate significant performance degradation caused by low-bit quantization. Unlike conventional MSE loss methods, which focus solely on point-to-point discrepancies, feature relationship matching captures feature representations in high-dimensional spaces to effectively bridge the representation gap between quantized and full-precision blocks. Additionally, we propose a multi-granularity per-group quantization technique featuring a customized kernel, designed based on the quantization sensitivity of decoder block, to further relieve the quantization performance degradation. Extensive experimental results demonstrate that our method achieves outstanding performance in the W4A4 low-bit scenario, maintaining near full-precision accuracy while delivering a 2 × throughput improvement and a 3.17 × memory reduction. This advantage is particularly evident in the latest models such as LLaMA-3, LLaMA-3.1 and Qwen2.5 models, as well as in the W3A3 extreme low-bit scenarios. Codes are available at https://github.com/HITSZ-Miao-Group/FRM.
{"title":"FRM-PTQ: Feature relationship matching enhanced low-bit post-training quantization for large language models","authors":"Chao Zeng , Jiaqi Zhao , Miao Zhang , Li Wang , Weili Guan , Liqiang Nie","doi":"10.1016/j.neunet.2026.108619","DOIUrl":"10.1016/j.neunet.2026.108619","url":null,"abstract":"<div><div>Post-Training Quantization (PTQ) has emerged as an effective approach to reduce memory and computational demands during LLMs inference. However, existing PTQ methods are highly sensitive to ultra-low-bit quantization with significant performance loss, which is further exacerbated by recently released advanced models like LLaMA-3 and LLaMA-3.1. To address this challenge, we propose a novel PTQ framework, termed <strong>FRM-PTQ</strong>, by introducing feature relationship matching. This approach integrates token-level relationship modeling and structure-level distribution alignment based on the intra-block self-distillation framework to effectively mitigate significant performance degradation caused by low-bit quantization. Unlike conventional MSE loss methods, which focus solely on point-to-point discrepancies, feature relationship matching captures feature representations in high-dimensional spaces to effectively bridge the representation gap between quantized and full-precision blocks. Additionally, we propose a multi-granularity per-group quantization technique featuring a customized kernel, designed based on the quantization sensitivity of decoder block, to further relieve the quantization performance degradation. Extensive experimental results demonstrate that our method achieves outstanding performance in the W4A4 low-bit scenario, maintaining near full-precision accuracy while delivering a 2 × throughput improvement and a 3.17 × memory reduction. This advantage is particularly evident in the latest models such as LLaMA-3, LLaMA-3.1 and Qwen2.5 models, as well as in the W3A3 extreme low-bit scenarios. Codes are available at <span><span>https://github.com/HITSZ-Miao-Group/FRM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108619"},"PeriodicalIF":6.3,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-17DOI: 10.1016/j.neunet.2026.108609
Lu Zhang, Jisheng Dang, Shu Zhang, Wencheng Gan, Juan Wang, Bin Hu, Gang Feng, Hong Peng
Electroencephalography (EEG) signals contain rich spatiotemporal information reflecting brain activity, making them valuable for analyzing cognitive, emotional, and neurological disorders. However, effectively integrating these two types of information to capture both discriminative and complementary features remains a significant challenge. To address this, we propose a Graph-Enhanced Dual Low-Rank Correlation Embedding (GEDLCE) method, which integrates spatiotemporal EEG features to improve depression recognition. GEDLCE enforces low-rank constraints at both feature and sample levels, enabling extraction of shared latent factors across multiple feature sets. To preserve the intrinsic geometric structure of the data, GEDLCE employs two graph Laplacian terms to model local relationships in the sample space. Furthermore, GEDLCE introduces a graph embedding term that utilizes label information to enhance its discriminative capability. In addition, GEDLCE incorporates an enhanced correlation analysis to exploit inter-view correlations while reducing intra-view redundancy. Finally, GEDLCE jointly optimizes low-rank representations, correlation constraints, and graph embedding within a unified framework. Experiments on EEG datasets show that GEDLCE effectively captures critical information, achieves superior performance in depression recognition, and shows promise for early diagnosis and disease monitoring.
{"title":"Graph-enhanced dual low-rank correlation embedding for spatio-temporal EEG fusion in depression recognition.","authors":"Lu Zhang, Jisheng Dang, Shu Zhang, Wencheng Gan, Juan Wang, Bin Hu, Gang Feng, Hong Peng","doi":"10.1016/j.neunet.2026.108609","DOIUrl":"https://doi.org/10.1016/j.neunet.2026.108609","url":null,"abstract":"<p><p>Electroencephalography (EEG) signals contain rich spatiotemporal information reflecting brain activity, making them valuable for analyzing cognitive, emotional, and neurological disorders. However, effectively integrating these two types of information to capture both discriminative and complementary features remains a significant challenge. To address this, we propose a Graph-Enhanced Dual Low-Rank Correlation Embedding (GEDLCE) method, which integrates spatiotemporal EEG features to improve depression recognition. GEDLCE enforces low-rank constraints at both feature and sample levels, enabling extraction of shared latent factors across multiple feature sets. To preserve the intrinsic geometric structure of the data, GEDLCE employs two graph Laplacian terms to model local relationships in the sample space. Furthermore, GEDLCE introduces a graph embedding term that utilizes label information to enhance its discriminative capability. In addition, GEDLCE incorporates an enhanced correlation analysis to exploit inter-view correlations while reducing intra-view redundancy. Finally, GEDLCE jointly optimizes low-rank representations, correlation constraints, and graph embedding within a unified framework. Experiments on EEG datasets show that GEDLCE effectively captures critical information, achieves superior performance in depression recognition, and shows promise for early diagnosis and disease monitoring.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"108609"},"PeriodicalIF":6.3,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146114755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1016/j.neunet.2026.108607
Teng Feng , Junwei Xu , Tao Huang , Zhenyu Wang , Fangfang Wu , Weisheng Dong , Xin Li , Guangming Shi
Blind Face Restoration (BFR) has garnered considerable attention for its practical applicability to recover high-quality (HQ) facial images from their degraded versions. Existing BFR methods primarily incorporate diverse priors to mitigate its ill-posed nature. Notably, the codebook prior, which aggregates facial representations from HQ images has achieved impressive results. However, two performance constraints remain: i) The reliance on a single spatial-domain codebook neglects the potential information in the frequency domain. ii) The commonly used feature-matching strategies overlook the valid information encapsulated within the low-quality (LQ) identity features. To address these issues, we propose CoCoFR, which learns collaborative codebooks in both spatial and frequency domains and implements adaptive matching between LQ and HQ features with a designed Dual Codebooks Cross Attention (DCCA) module. Additionally, benefiting from its global receptive fields and linear complexity, CoCoFR facilitates coarse-to-fine feature fusion via a simple yet effective state space model (Mamba)-based fusion block MFB. Extensive experiments on both synthetic and real-world datasets validate the superiority of our CoCoFR in terms of realness and fidelity compared to state-of-the-art methods.
盲人脸复原技术(Blind Face Restoration,简称BFR)因其在高质量人脸图像复原中的实用性而受到广泛关注。现有的BFR方法主要采用多种先验来减轻其病态性。值得注意的是,先前的码本聚合了HQ图像的面部表征,取得了令人印象深刻的结果。然而,仍然存在两个性能限制:i)对单个空间域码本的依赖忽略了频域的潜在信息。ii)常用的特征匹配策略忽略了封装在低质量(LQ)身份特征中的有效信息。为了解决这些问题,我们提出了CoCoFR,它在空间和频域学习协作码本,并通过设计的双码本交叉注意(Dual codebooks Cross Attention, DCCA)模块实现LQ和HQ特征之间的自适应匹配。此外,得益于其全局接受域和线性复杂性,CoCoFR通过一个简单而有效的基于状态空间模型(Mamba)的融合块MFB促进了粗到细的特征融合。在合成数据集和真实数据集上进行的大量实验验证了我们的CoCoFR在真实感和保真度方面的优势。
{"title":"CoCoFR: Collaborative codebooks learning with soft matching strategy for blind face restoration","authors":"Teng Feng , Junwei Xu , Tao Huang , Zhenyu Wang , Fangfang Wu , Weisheng Dong , Xin Li , Guangming Shi","doi":"10.1016/j.neunet.2026.108607","DOIUrl":"10.1016/j.neunet.2026.108607","url":null,"abstract":"<div><div>Blind Face Restoration (BFR) has garnered considerable attention for its practical applicability to recover high-quality (HQ) facial images from their degraded versions. Existing BFR methods primarily incorporate diverse priors to mitigate its ill-posed nature. Notably, the codebook prior, which aggregates facial representations from HQ images has achieved impressive results. However, two performance constraints remain: i) The reliance on a single spatial-domain codebook neglects the potential information in the frequency domain. ii) The commonly used feature-matching strategies overlook the valid information encapsulated within the low-quality (LQ) identity features. To address these issues, we propose CoCoFR, which learns collaborative codebooks in both spatial and frequency domains and implements adaptive matching between LQ and HQ features with a designed Dual Codebooks Cross Attention (DCCA) module. Additionally, benefiting from its global receptive fields and linear complexity, CoCoFR facilitates coarse-to-fine feature fusion via a simple yet effective state space model (Mamba)-based fusion block MFB. Extensive experiments on both synthetic and real-world datasets validate the superiority of our CoCoFR in terms of realness and fidelity compared to state-of-the-art methods.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108607"},"PeriodicalIF":6.3,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1016/j.neunet.2026.108608
Xi Fu , Weibang Jiang , Rui Liu , Gernot R. Müller-Putz , Cuntai Guan
Accurate decoding of lower-limb motion from EEG signals is essential for advancing brain-computer interface (BCI) applications in movement intent recognition and control. This study presents NeuroDyGait, a two-stage, phase-aware EEG-to-gait decoding framework that explicitly models temporal continuity and domain relationships. To address challenges of causal, phase-consistent prediction and cross-subject variability, Stage I learns semantically aligned EEG-motion embeddings via relative contrastive learning with a cross-attention-based metric, while Stage II performs domain relation-aware decoding through dynamic fusion of session-specific heads. Comprehensive experiments on two benchmark datasets (GED and FMD) show substantial gains over baselines, including a recent 2025 model EEG2GAIT. The framework generalizes to unseen subjects and maintains inference latency below 5 ms per window, satisfying real-time BCI requirements. Visualization of learned attention and phase-specific cortical saliency maps further reveals interpretable neural correlates of gait phases. Future extensions will target rehabilitation populations and multimodal integration.
{"title":"EEG-to-gait decoding via phase-aware representation learning","authors":"Xi Fu , Weibang Jiang , Rui Liu , Gernot R. Müller-Putz , Cuntai Guan","doi":"10.1016/j.neunet.2026.108608","DOIUrl":"10.1016/j.neunet.2026.108608","url":null,"abstract":"<div><div>Accurate decoding of lower-limb motion from EEG signals is essential for advancing brain-computer interface (BCI) applications in movement intent recognition and control. This study presents <strong>NeuroDyGait</strong>, a two-stage, phase-aware EEG-to-gait decoding framework that explicitly models temporal continuity and domain relationships. To address challenges of causal, phase-consistent prediction and cross-subject variability, Stage I learns semantically aligned EEG-motion embeddings via relative contrastive learning with a cross-attention-based metric, while Stage II performs domain relation-aware decoding through dynamic fusion of session-specific heads. Comprehensive experiments on two benchmark datasets (GED and FMD) show substantial gains over baselines, including a recent 2025 model EEG2GAIT. The framework generalizes to unseen subjects and maintains inference latency below 5 ms per window, satisfying real-time BCI requirements. Visualization of learned attention and phase-specific cortical saliency maps further reveals interpretable neural correlates of gait phases. Future extensions will target rehabilitation populations and multimodal integration.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108608"},"PeriodicalIF":6.3,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1016/j.neunet.2026.108595
Zhiyu Guo , Yang Liu , Xiang Ao , Yateng Tang , Xinhuan Chen , Xuehao Zheng , Qing He
Graph Transformers (GTs), as emerging foundational encoders for graph-structured data, have shown promising performance due to the integration of local graph structures with global attention mechanisms. However, the complex attention functions and their coupling with graph structures incur significant computational overhead, particularly in large-scale graphs. In this paper, we decouple graph structures from Transformers and propose the Graph-Agnostic Linear Transformer (GALiT). In GALiT, graph structures are solely utilized to denoise raw node features before training, as our findings reveal that these denoised features have integrated the main information of the graph structure and can replace it to guide Transformers. By excluding graph structures from the training and inference stages, GALiT serves as a graph-agnostic model which significantly reduces computational complexity. Additionally, we simplify the linear attention functions inherited from traditional Transformers, which further reduces computational overhead while still capturing the relationships between nodes. Through weighted combination, we integrate the denoised features into the attention mechanism, as our theoretical analysis reveals the key role of the synergy between linear attention and denoised features in enhancing representation diversity. Despite decoupling graph structures and simplifying attention mechanisms, our model surprisingly outperforms most GNNs and GTs on benchmark graphs. Experimental results indicate that GALiT achieves high efficiency while maintaining or even enhancing performance.
{"title":"Graph-Agnostic Linear Transformers","authors":"Zhiyu Guo , Yang Liu , Xiang Ao , Yateng Tang , Xinhuan Chen , Xuehao Zheng , Qing He","doi":"10.1016/j.neunet.2026.108595","DOIUrl":"10.1016/j.neunet.2026.108595","url":null,"abstract":"<div><div>Graph Transformers (GTs), as emerging foundational encoders for graph-structured data, have shown promising performance due to the integration of local graph structures with global attention mechanisms. However, the complex attention functions and their coupling with graph structures incur significant computational overhead, particularly in large-scale graphs. In this paper, we decouple graph structures from Transformers and propose the Graph-Agnostic Linear Transformer (GALiT). In GALiT, graph structures are solely utilized to denoise raw node features before training, as our findings reveal that these denoised features have integrated the main information of the graph structure and can replace it to guide Transformers. By excluding graph structures from the training and inference stages, GALiT serves as a graph-agnostic model which significantly reduces computational complexity. Additionally, we simplify the linear attention functions inherited from traditional Transformers, which further reduces computational overhead while still capturing the relationships between nodes. Through weighted combination, we integrate the denoised features into the attention mechanism, as our theoretical analysis reveals the key role of the synergy between linear attention and denoised features in enhancing representation diversity. Despite decoupling graph structures and simplifying attention mechanisms, our model surprisingly outperforms most GNNs and GTs on benchmark graphs. Experimental results indicate that GALiT achieves high efficiency while maintaining or even enhancing performance.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108595"},"PeriodicalIF":6.3,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1016/j.neunet.2026.108599
Jinhui Zhu , Xiangfeng Luo , Xiao Wei , Xin Yao
The core of mining point of interest (POI) data is to learn the user preference representation. However, existing POI sequence learning methods often serve downstream tasks in an end-to-end manner. It lacks the ability to support multiple downstream tasks, resulting in unsatisfactory generalization and poor performance. Besides, although POI sequence learning uses contrastive learning to learn the user preferences feature in positive and negative samples, they fail to simultaneously consider that negative samples contain useful characteristics. To improve the generalization and performance of POI sequence learning methods for various downstream tasks, we propose an Adversarial Contrastive with Leveraging Negative Knowledge model (ACLNK). First, we design an adversarial generalizing representation module for capturing the user long-term preferences to generate a generalized user historical representation incorporating user social circles. Second, to capture comprehensive short-term preferences from a limited input sequence, we design a negative sample knowledge extraction attention mechanism to absorb knowledge from negative data. Finally, the learned short- and long-term preferences as the input of the contrastive module to generate the accurate user generalization representation. We demonstrate the effectiveness and generality of ACLNK on three check-in sequence datasets for two kinds of downstream tasks. Extensive experiments demonstrate that our proposed model significantly outperforms previous state-of-the-art models. Our code is available at https://github.com/Lucas-Z9277/ACLNK_main.
{"title":"Adversarial contrastive with leveraging negative knowledge for point of interest sequence learning","authors":"Jinhui Zhu , Xiangfeng Luo , Xiao Wei , Xin Yao","doi":"10.1016/j.neunet.2026.108599","DOIUrl":"10.1016/j.neunet.2026.108599","url":null,"abstract":"<div><div>The core of mining point of interest (POI) data is to learn the user preference representation. However, existing POI sequence learning methods often serve downstream tasks in an end-to-end manner. It lacks the ability to support multiple downstream tasks, resulting in unsatisfactory generalization and poor performance. Besides, although POI sequence learning uses contrastive learning to learn the user preferences feature in positive and negative samples, they fail to simultaneously consider that negative samples contain useful characteristics. To improve the generalization and performance of POI sequence learning methods for various downstream tasks, we propose an Adversarial Contrastive with Leveraging Negative Knowledge model (ACLNK). First, we design an adversarial generalizing representation module for capturing the user long-term preferences to generate a generalized user historical representation incorporating user social circles. Second, to capture comprehensive short-term preferences from a limited input sequence, we design a negative sample knowledge extraction attention mechanism to absorb knowledge from negative data. Finally, the learned short- and long-term preferences as the input of the contrastive module to generate the accurate user generalization representation. We demonstrate the effectiveness and generality of ACLNK on three check-in sequence datasets for two kinds of downstream tasks. Extensive experiments demonstrate that our proposed model significantly outperforms previous state-of-the-art models. Our code is available at <span><span>https://github.com/Lucas-Z9277/ACLNK_main</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108599"},"PeriodicalIF":6.3,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For modality translation tasks, diffusion models based on x-prediction offer faster and more accurate image generation compared to traditional ϵ-prediction. However, they often suffer from training-inference inconsistency (TII), which arises from a mismatch between the Gaussian distribution assumed by the preset noise schedule and the true data distribution. To address this, we propose DSA-Diff, a novel framework that employs dual noise schedules to decouple the training and inference processes. Our approach decomposes the noise schedule along three dimensions: noise sequence, timestep, and correction matrix, and introduces a Bayesian-Greedy Alignment Scheduler (BGAS) to dynamically reconstruct the inference schedule. BGAS combines greedy initialization and Bayesian optimization to align the generated data manifold with the true one. Additionally, we introduce progressive target prediction and multi-scale perceptual alignment to enhance the robustness and detail fidelity of the x-prediction model. Experiments on four datasets show that DSA-Diff achieves high-fidelity image synthesis in only 4–10 adaptive inference steps, with minimal computational cost (68 GFLOPS). It improves the SSIM metric by up to 2.56% in TFW dataset using only one additional algorithmic module, effectively mitigating TII. Code and models are available at: https://github.com/ElephantOH/DSA-Diff.
{"title":"DSA-Diff: Dynamic schedule alignment for training-Inference consistent modality translation in x-prediction diffusion model","authors":"Xianhua Zeng , Yixin Xiang , Jian Zhang , Bowen Lu","doi":"10.1016/j.neunet.2026.108611","DOIUrl":"10.1016/j.neunet.2026.108611","url":null,"abstract":"<div><div>For modality translation tasks, diffusion models based on x-prediction offer faster and more accurate image generation compared to traditional ϵ-prediction. However, they often suffer from training-inference inconsistency (TII), which arises from a mismatch between the Gaussian distribution assumed by the preset noise schedule and the true data distribution. To address this, we propose DSA-Diff, a novel framework that employs dual noise schedules to decouple the training and inference processes. Our approach decomposes the noise schedule along three dimensions: noise sequence, timestep, and correction matrix, and introduces a Bayesian-Greedy Alignment Scheduler (BGAS) to dynamically reconstruct the inference schedule. BGAS combines greedy initialization and Bayesian optimization to align the generated data manifold with the true one. Additionally, we introduce progressive target prediction and multi-scale perceptual alignment to enhance the robustness and detail fidelity of the x-prediction model. Experiments on four datasets show that DSA-Diff achieves high-fidelity image synthesis in only 4–10 adaptive inference steps, with minimal computational cost (68 GFLOPS). It improves the SSIM metric by up to 2.56% in TFW dataset using only one additional algorithmic module, effectively mitigating TII. Code and models are available at: <span><span>https://github.com/ElephantOH/DSA-Diff</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108611"},"PeriodicalIF":6.3,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1016/j.neunet.2026.108605
Shuai Zhang, Shan Yang, Wenyu Zhang, Jiahao Nie, Shan Ji
Graph contrastive learning methods based on the information noise contrastive estimation (InfoNCE) loss have made significant advances in graph representation learning. However, existing methods primarily focus on optimizing graph augmentation strategies or contrastive objectives. They cannot effectively eliminate the message contrastive conflict (MCC) that arises from the collaboration between the InfoNCE loss and the message-passing mechanism of graph neural networks. The MCC prevents the effective minimization of similarity among negative samples, thereby undermining the efficacy of graph contrastive learning. Furthermore, the issues of false negative samples and long-tail conflict effect (LCE) under the MCC remain unresolved. To this end, a novel method termed gradient-guided graph contrastive learning for eliminating the message contrastive conflict (G2CL) is proposed. First, this study theoretically demonstrates the existence of the MCC and analyzes in detail the impact of false negative samples and LCE on the MCC. In addition, a new gradient-guided dynamic capturer is proposed to eliminate the MCC. Next, based on the semantic and topological information of the graph, a new false negative strategy is proposed to address the issue of false negative samples. Furthermore, a new pheromone-based message-passing mechanism is proposed to address the issue of LCE. Finally, extensive experiments on 11 datasets demonstrate that the G2CL outperforms state-of-the-art baselines.
{"title":"G2CL: Gradient-guided graph contrastive learning for eliminating the message contrastive conflict","authors":"Shuai Zhang, Shan Yang, Wenyu Zhang, Jiahao Nie, Shan Ji","doi":"10.1016/j.neunet.2026.108605","DOIUrl":"10.1016/j.neunet.2026.108605","url":null,"abstract":"<div><div>Graph contrastive learning methods based on the information noise contrastive estimation (InfoNCE) loss have made significant advances in graph representation learning. However, existing methods primarily focus on optimizing graph augmentation strategies or contrastive objectives. They cannot effectively eliminate the message contrastive conflict (MCC) that arises from the collaboration between the InfoNCE loss and the message-passing mechanism of graph neural networks. The MCC prevents the effective minimization of similarity among negative samples, thereby undermining the efficacy of graph contrastive learning. Furthermore, the issues of false negative samples and long-tail conflict effect (LCE) under the MCC remain unresolved. To this end, a novel method termed gradient-guided graph contrastive learning for eliminating the message contrastive conflict (G2CL) is proposed. First, this study theoretically demonstrates the existence of the MCC and analyzes in detail the impact of false negative samples and LCE on the MCC. In addition, a new gradient-guided dynamic capturer is proposed to eliminate the MCC. Next, based on the semantic and topological information of the graph, a new false negative strategy is proposed to address the issue of false negative samples. Furthermore, a new pheromone-based message-passing mechanism is proposed to address the issue of LCE. Finally, extensive experiments on 11 datasets demonstrate that the G2CL outperforms state-of-the-art baselines.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108605"},"PeriodicalIF":6.3,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}