Self-attention models in sequential recommendation face two under-explored but complementary challenges: (1) susceptibility to attention noise, especially in long-sequence modeling, and (2) difficulty in distinguishing between repetition and exploration behaviors under the Softmax bottleneck. To jointly tackle these challenges, we propose a unified framework with gated differential amplified attention and repetition-exploration intent modeling (GDA-REIM for short). Existing methods generally address these problems individually, via external denoising or architecture-level branching for repetition-exploration modeling. However, noise in attention weights can obscure true user intent (repetition vs. exploration), while clear intent boundaries can guide more effective denoising, making joint optimization essential yet unexplored in prior work. GDA-REIM incorporates a gated differential amplified attention (GDAA) module, which employs a three-stage "differentiation-gating-amplification" pipeline that computes and subtracts paired attention maps to suppress common-mode noise and dynamically rescales the denoised signal. Leveraging the resulting denoised representations, a partitioned intent scoring (PIS) component together with an intent discrimination margin (IDM) loss explicitly distinguishes repetition and exploration intent. Extensive experiments on ML-1M, Amazon-Video-Games, and Twitch-100k datasets demonstrate consistent improvements over strong baselines (e.g., approximately +10% improvement in NDCG@10, or N@10 for short, on ML-1M). Our code is released at https://anonymous.4open.science/r/GDA-REIM/.
{"title":"A unified framework for sequential recommendation with gated differential amplified attention and repetition-exploration intent modeling.","authors":"Jinzhao Su, Shiyu Liu, Shunzhi Yang, Chang-Dong Wang, Shengli Sun, Zhenhua Huang","doi":"10.1016/j.neunet.2026.108721","DOIUrl":"https://doi.org/10.1016/j.neunet.2026.108721","url":null,"abstract":"<p><p>Self-attention models in sequential recommendation face two under-explored but complementary challenges: (1) susceptibility to attention noise, especially in long-sequence modeling, and (2) difficulty in distinguishing between repetition and exploration behaviors under the Softmax bottleneck. To jointly tackle these challenges, we propose a unified framework with gated differential amplified attention and repetition-exploration intent modeling (GDA-REIM for short). Existing methods generally address these problems individually, via external denoising or architecture-level branching for repetition-exploration modeling. However, noise in attention weights can obscure true user intent (repetition vs. exploration), while clear intent boundaries can guide more effective denoising, making joint optimization essential yet unexplored in prior work. GDA-REIM incorporates a gated differential amplified attention (GDAA) module, which employs a three-stage \"differentiation-gating-amplification\" pipeline that computes and subtracts paired attention maps to suppress common-mode noise and dynamically rescales the denoised signal. Leveraging the resulting denoised representations, a partitioned intent scoring (PIS) component together with an intent discrimination margin (IDM) loss explicitly distinguishes repetition and exploration intent. Extensive experiments on ML-1M, Amazon-Video-Games, and Twitch-100k datasets demonstrate consistent improvements over strong baselines (e.g., approximately +10% improvement in NDCG@10, or N@10 for short, on ML-1M). Our code is released at https://anonymous.4open.science/r/GDA-REIM/.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"108721"},"PeriodicalIF":6.3,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146229159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The core challenge of multi-modal object re-identification (ReID) lies in reconciling the style discrepancies across different modalities with the semantic consistency of identity. However, existing methods are difficult to effectively separate semantic features from modality-specific styles, resulting in semantic representations being contaminated by noise and affecting recognition performance. To address the above issues, we propose a multi-modal re-identification framework based on semantic-stylistic decoupled distillation, named SD2-ReID (Semantic-Stylistic Decoupled Distillation for ReID), aiming to improve modal consistency and cross-modal semantic discrimination. Firstly, we design a Hybrid Multi-modal Feature Extractor (HMFE) that employs a shared shallow structure and modality-specific deep branches to achieve fine-grained feature extraction, thereby improving learning efficiency while preserving modality-specific characteristics; secondly, we design a Decoupled Distillation Module (DDM) that explicitly separates semantic and stylistic features through dual constraints of semantic and style distillation, improving cross-modal semantic consistency and discriminative ability; finally, we propose an attention-guided masking strategy and integrate intra-modal and cross-modal contrastive learning to construct a Hierarchical Self-supervised Learning Module (HSLM), thereby enhancing the model's robustness to local occlusions and style variations.The synergistic enhancement of semantic consistency, modal invariance and feature robustness is finally realized. Unlike existing methods, SD2-ReID does not require the design of a multi-modal fusion module and does not introduce additional overhead in the inference phase, while balancing recognition performance and inference efficiency. Experiments on three multi-modal object ReID benchmark test sets fully validate the effectiveness of our method.
多模态对象再识别的核心挑战在于如何协调不同模态之间的风格差异和同一性的语义一致性。然而,现有的方法难以有效地将语义特征与特定于模态的样式分离开来,导致语义表示受到噪声的污染,影响识别性能。针对上述问题,本文提出了一种基于语义文体解耦蒸馏的多模态再识别框架,命名为SD2-ReID (semantic-stylistic decoupling distillation for ReID),旨在提高模态一致性和跨模态语义识别能力。首先,我们设计了一种混合多模态特征提取器(HMFE),利用共享的浅结构和模态特定的深分支来实现细粒度特征提取,从而在保持模态特定特征的同时提高了学习效率;其次,设计了解耦蒸馏模块(DDM),通过语义和风格蒸馏的双重约束明确分离语义和风格特征,提高了跨模态语义一致性和判别能力;最后,我们提出了一种注意引导掩蔽策略,并结合模态内和跨模态对比学习构建了层次自监督学习模块(HSLM),从而增强了模型对局部闭塞和风格变化的鲁棒性。最终实现了语义一致性、模态不变性和特征鲁棒性的协同增强。与现有方法不同,SD2-ReID不需要设计多模态融合模块,也不会在推理阶段引入额外的开销,同时平衡了识别性能和推理效率。在三个多模态目标ReID基准测试集上的实验充分验证了该方法的有效性。
{"title":"SD<sup>2</sup>-ReID: A semantic-stylistic decoupled distillation framework for robust multi-modal object re-identification.","authors":"Yonghao Yan, Meijing Gao, Yang Bai, Xu Chen, Bingzhou Sun, Huanyu Sun, Sibo Chen","doi":"10.1016/j.neunet.2026.108719","DOIUrl":"https://doi.org/10.1016/j.neunet.2026.108719","url":null,"abstract":"<p><p>The core challenge of multi-modal object re-identification (ReID) lies in reconciling the style discrepancies across different modalities with the semantic consistency of identity. However, existing methods are difficult to effectively separate semantic features from modality-specific styles, resulting in semantic representations being contaminated by noise and affecting recognition performance. To address the above issues, we propose a multi-modal re-identification framework based on semantic-stylistic decoupled distillation, named SD<sup>2</sup>-ReID (Semantic-Stylistic Decoupled Distillation for ReID), aiming to improve modal consistency and cross-modal semantic discrimination. Firstly, we design a Hybrid Multi-modal Feature Extractor (HMFE) that employs a shared shallow structure and modality-specific deep branches to achieve fine-grained feature extraction, thereby improving learning efficiency while preserving modality-specific characteristics; secondly, we design a Decoupled Distillation Module (DDM) that explicitly separates semantic and stylistic features through dual constraints of semantic and style distillation, improving cross-modal semantic consistency and discriminative ability; finally, we propose an attention-guided masking strategy and integrate intra-modal and cross-modal contrastive learning to construct a Hierarchical Self-supervised Learning Module (HSLM), thereby enhancing the model's robustness to local occlusions and style variations.The synergistic enhancement of semantic consistency, modal invariance and feature robustness is finally realized. Unlike existing methods, SD<sup>2</sup>-ReID does not require the design of a multi-modal fusion module and does not introduce additional overhead in the inference phase, while balancing recognition performance and inference efficiency. Experiments on three multi-modal object ReID benchmark test sets fully validate the effectiveness of our method.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"108719"},"PeriodicalIF":6.3,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146203526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-10DOI: 10.1016/j.neunet.2026.108715
Abdullah Elen , Emre Avuçlu
Deep learning (DL) has achieved remarkable success in processing unstructured data such as images, text, and audio, yet its application to tabular numerical datasets remains challenging due to the lack of inherent spatial structure. In this study, we present a novel approach for transforming numerical tabular data into grayscale image representations, enabling the effective use of convolutional neural networks and other DL architectures on traditionally numerical datasets. The method normalizes features, organizes them into square image matrices, and generates labeled images for classification. Experiments were conducted on four publicly available datasets: Rice MSC Dataset (RMSCD), Optical Recognition of Handwritten Digits (Optdigits), TUNADROMD, and Spambase. Transformed datasets were evaluated using Residual Network (ResNet-18) and Directed Acyclic Graph Neural Network (DAG-Net) models with 5-fold cross-validation. The DAG-Net model achieved accuracies of 99.91% on RMSCD, 99.77% on Optdigits, 98.84% on TUNADROMD, and 93.06% on Spambase, demonstrating the efficacy of the proposed transformation. Additional ablation studies and efficiency analyses highlight improvements in training performance and computational cost. The results indicate that the proposed image-based transformation provides a practical and efficient strategy for integrating numerical datasets into deep learning workflows, broadening the applicability of DL techniques across diverse domains. The implementation is released as open-source software to facilitate reproducibility and further research.
深度学习(DL)在处理非结构化数据(如图像、文本和音频)方面取得了显著的成功,但由于缺乏固有的空间结构,将其应用于表格数字数据集仍然具有挑战性。在本研究中,我们提出了一种将数值表格数据转换为灰度图像表示的新方法,从而能够在传统的数值数据集上有效地使用卷积神经网络和其他深度学习架构。该方法将特征归一化,组织成方形图像矩阵,生成标记图像进行分类。实验在四个公开的数据集上进行:Rice MSC Dataset (RMSCD)、Optical Recognition of handwriting Digits (Optdigits)、TUNADROMD和Spambase。转换后的数据集使用残差网络(ResNet-18)和有向无环图神经网络(DAG-Net)模型进行评估,并进行5次交叉验证。DAG-Net模型在RMSCD上的准确率为99.91%,在Optdigits上的准确率为99.77%,在TUNADROMD上的准确率为98.84%,在Spambase上的准确率为93.06%,证明了所提出转换的有效性。额外的消融研究和效率分析强调了训练性能和计算成本的改进。结果表明,所提出的基于图像的转换为将数值数据集集成到深度学习工作流中提供了一种实用而有效的策略,扩大了深度学习技术在不同领域的适用性。该实现作为开源软件发布,以促进可重复性和进一步的研究。
{"title":"Transforming tabular data into images for deep learning models","authors":"Abdullah Elen , Emre Avuçlu","doi":"10.1016/j.neunet.2026.108715","DOIUrl":"10.1016/j.neunet.2026.108715","url":null,"abstract":"<div><div>Deep learning (DL) has achieved remarkable success in processing unstructured data such as images, text, and audio, yet its application to tabular numerical datasets remains challenging due to the lack of inherent spatial structure. In this study, we present a novel approach for transforming numerical tabular data into grayscale image representations, enabling the effective use of convolutional neural networks and other DL architectures on traditionally numerical datasets. The method normalizes features, organizes them into square image matrices, and generates labeled images for classification. Experiments were conducted on four publicly available datasets: Rice MSC Dataset (RMSCD), Optical Recognition of Handwritten Digits (Optdigits), TUNADROMD, and Spambase. Transformed datasets were evaluated using Residual Network (ResNet-18) and Directed Acyclic Graph Neural Network (DAG-Net) models with 5-fold cross-validation. The DAG-Net model achieved accuracies of 99.91% on RMSCD, 99.77% on Optdigits, 98.84% on TUNADROMD, and 93.06% on Spambase, demonstrating the efficacy of the proposed transformation. Additional ablation studies and efficiency analyses highlight improvements in training performance and computational cost. The results indicate that the proposed image-based transformation provides a practical and efficient strategy for integrating numerical datasets into deep learning workflows, broadening the applicability of DL techniques across diverse domains. The implementation is released as open-source software to facilitate reproducibility and further research.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108715"},"PeriodicalIF":6.3,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph Neural Networks (GNNs) have achieved strong performance in structured data modeling such as node classification. However, real-world graphs often exhibit heterogeneous neighborhoods and complex feature distributions, while mainstream approaches rely on many learnable parameters and apply uniform aggregation to all neighbors. This lack of explicit modeling for structural diversity often leads to representation homogenization, semantic degradation, and poor adaptability under challenging conditions such as low supervision or class imbalance. To address these limitations, we propose a trainable-parameter-free graph neural network framework, termed the Structural-Diversity Graph Neural Network (SDGNN), which operationalizes structural diversity in message passing. At its core, the Structural-Diversity Message Passing (SDMP) mechanism performs within-group statistics followed by cross-group selection, thereby capturing neighborhood heterogeneity while stabilizing feature semantics. SDGNN further incorporates complementary structure-driven and feature-driven partitioning strategies, together with a normalized-propagation-based global structural enhancer, to enhance adaptability across diverse graphs. Extensive experiments on nine public benchmark datasets and an interdisciplinary PubMed citation network demonstrate that SDGNN consistently outperforms mainstream GNNs, especially under low supervision, class imbalance, and cross-domain transfer. The full implementation, including code and configurations, is publicly available at: https://github.com/mingyue15694/SGDNN/tree/main.
{"title":"Trainable-parameter-free structural-diversity message passing for graph neural networks","authors":"Mingyue Kong, Yinglong Zhang, Chengda Xu, Xuewen Xia, Xing Xu","doi":"10.1016/j.neunet.2026.108711","DOIUrl":"10.1016/j.neunet.2026.108711","url":null,"abstract":"<div><div>Graph Neural Networks (GNNs) have achieved strong performance in structured data modeling such as node classification. However, real-world graphs often exhibit heterogeneous neighborhoods and complex feature distributions, while mainstream approaches rely on many learnable parameters and apply uniform aggregation to all neighbors. This lack of explicit modeling for structural diversity often leads to representation homogenization, semantic degradation, and poor adaptability under challenging conditions such as low supervision or class imbalance. To address these limitations, we propose a trainable-parameter-free graph neural network framework, termed the Structural-Diversity Graph Neural Network (SDGNN), which operationalizes structural diversity in message passing. At its core, the Structural-Diversity Message Passing (SDMP) mechanism performs within-group statistics followed by cross-group selection, thereby capturing neighborhood heterogeneity while stabilizing feature semantics. SDGNN further incorporates complementary structure-driven and feature-driven partitioning strategies, together with a normalized-propagation-based global structural enhancer, to enhance adaptability across diverse graphs. Extensive experiments on nine public benchmark datasets and an interdisciplinary PubMed citation network demonstrate that SDGNN consistently outperforms mainstream GNNs, especially under low supervision, class imbalance, and cross-domain transfer. The full implementation, including code and configurations, is publicly available at: <span><span>https://github.com/mingyue15694/SGDNN/tree/main</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108711"},"PeriodicalIF":6.3,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-10DOI: 10.1016/j.neunet.2026.108705
Bingxun Zhao, Xiao Han, Ruihao Sui, Yuan Chen
Research on underwater object detection has primarily focused on addressing degraded imagery. Single-frame feature refinement is inherently limited by restricted static spatial information, while joint image enhancement and detection paradigms encounter non-trivial challenges arising from irreversible artifacts and conflicting optimization objectives. In contrast, temporal information from video sequences offers a direct solution. Temporal semantic information enhances the feature representation of degraded underwater frames, whereas temporal positional cues furnish dynamic geometric associations that facilitate precise object localization. We propose Transformer Underwater Spatial-Temporal Cross-domain Collaborative Detection (TransUTD), reformulating underwater degraded feature representation as a temporal contextual modeling problem. By synergistic exploitation of spatial-temporal information, TransUTD naturally learns complementary features across frames to compensate for feature degradation in key frames, rather than relying on specific heuristic components. This simplifies the detection pipeline and eliminates hand-crafted modules. In our framework, the spatial-temporal fusion encoder aggregates multi-frame features to strengthen semantic representations in degraded images. The spatial-temporal query interaction refines localization in complex underwater scenes by correlating spatial-temporal geometric cues. Finally, the temporal hybrid collaborative decoder performs dense supervision through collaborative optimization of temporal positive queries. Concurrently, we construct UVID, the first underwater video object detection dataset. Experimental evaluations demonstrate that TransUTD achieves state-of-the-art performance, delivering AP improvements of 1.5% and 1.9% on the DUO and UVID datasets, respectively. Moreover, it attains near SOTA performance on ImageNetVID with AP50 of 86.0%. Our dataset and code are available at https://github.com/Anchor1566/TransUTD.
{"title":"TransUTD: Underwater cross-domain collaborative spatial-temporal transformer detector.","authors":"Bingxun Zhao, Xiao Han, Ruihao Sui, Yuan Chen","doi":"10.1016/j.neunet.2026.108705","DOIUrl":"https://doi.org/10.1016/j.neunet.2026.108705","url":null,"abstract":"<p><p>Research on underwater object detection has primarily focused on addressing degraded imagery. Single-frame feature refinement is inherently limited by restricted static spatial information, while joint image enhancement and detection paradigms encounter non-trivial challenges arising from irreversible artifacts and conflicting optimization objectives. In contrast, temporal information from video sequences offers a direct solution. Temporal semantic information enhances the feature representation of degraded underwater frames, whereas temporal positional cues furnish dynamic geometric associations that facilitate precise object localization. We propose Transformer Underwater Spatial-Temporal Cross-domain Collaborative Detection (TransUTD), reformulating underwater degraded feature representation as a temporal contextual modeling problem. By synergistic exploitation of spatial-temporal information, TransUTD naturally learns complementary features across frames to compensate for feature degradation in key frames, rather than relying on specific heuristic components. This simplifies the detection pipeline and eliminates hand-crafted modules. In our framework, the spatial-temporal fusion encoder aggregates multi-frame features to strengthen semantic representations in degraded images. The spatial-temporal query interaction refines localization in complex underwater scenes by correlating spatial-temporal geometric cues. Finally, the temporal hybrid collaborative decoder performs dense supervision through collaborative optimization of temporal positive queries. Concurrently, we construct UVID, the first underwater video object detection dataset. Experimental evaluations demonstrate that TransUTD achieves state-of-the-art performance, delivering AP improvements of 1.5% and 1.9% on the DUO and UVID datasets, respectively. Moreover, it attains near SOTA performance on ImageNetVID with AP<sub>50</sub> of 86.0%. Our dataset and code are available at https://github.com/Anchor1566/TransUTD.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"108705"},"PeriodicalIF":6.3,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146214778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-10DOI: 10.1016/j.neunet.2026.108713
Shuo Han, Qibing Qin, Jinkui Hou, Wenfeng Zhang, Lei Huang
Hashing techniques are widely adopted in large-scale retrieval due to their low time and space complexity. Existing deep cross-modal hashing methods mostly rely on mini-batch training, where only a limited number of samples are processed in each iteration, often resulting in incomplete neighborhood exploration and sub-optimal embedding learning, especially on complex multi-label datasets. To address this limitation, recent studies have optimized the SoftMax loss, which is essentially equivalent to a smoothed triplet constraint with a single center assigned to each class. However, in practical retrieval scenarios, class distributions in the embedding space often contain multiple semantic clusters. Modeling each class with only one center fails to capture these intra-class local structures, thereby widening the semantic gap between heterogeneous samples. To alleviate this issue, we propose a novel deep cross-modal hashing framework, Deep SoftTriple Hashing (DSTH), which learns compact hash codes to better preserve semantic similarities in the embedding space. The framework introduces multiple centers for each class to effectively model the implicit distribution of heterogeneous samples and reduce intra-class semantic variance. To determine the number of centers, a class-center strategy is developed, where similar centers are encouraged to aggregate through an L2,1 regularization to obtain a compact set of centers. In addition, a semantic position quantization loss is introduced to minimize quantization error and enhance the discriminability of binary codes. Extensive experiments on three multi-label datasets demonstrate the effectiveness of DSTH in cross-modal retrieval, achieving absolute mAP improvements of 1.2%-9.3% over strong baselines while consistently maintaining superior performance on PR curves. The source code is available at: https://github.com/QinLab-WFU/DSTH-SoftTriple.
{"title":"Deep Softtriple hashing for Multi-Label cross-modal retrieval.","authors":"Shuo Han, Qibing Qin, Jinkui Hou, Wenfeng Zhang, Lei Huang","doi":"10.1016/j.neunet.2026.108713","DOIUrl":"https://doi.org/10.1016/j.neunet.2026.108713","url":null,"abstract":"<p><p>Hashing techniques are widely adopted in large-scale retrieval due to their low time and space complexity. Existing deep cross-modal hashing methods mostly rely on mini-batch training, where only a limited number of samples are processed in each iteration, often resulting in incomplete neighborhood exploration and sub-optimal embedding learning, especially on complex multi-label datasets. To address this limitation, recent studies have optimized the SoftMax loss, which is essentially equivalent to a smoothed triplet constraint with a single center assigned to each class. However, in practical retrieval scenarios, class distributions in the embedding space often contain multiple semantic clusters. Modeling each class with only one center fails to capture these intra-class local structures, thereby widening the semantic gap between heterogeneous samples. To alleviate this issue, we propose a novel deep cross-modal hashing framework, Deep SoftTriple Hashing (DSTH), which learns compact hash codes to better preserve semantic similarities in the embedding space. The framework introduces multiple centers for each class to effectively model the implicit distribution of heterogeneous samples and reduce intra-class semantic variance. To determine the number of centers, a class-center strategy is developed, where similar centers are encouraged to aggregate through an L<sub>2,1</sub> regularization to obtain a compact set of centers. In addition, a semantic position quantization loss is introduced to minimize quantization error and enhance the discriminability of binary codes. Extensive experiments on three multi-label datasets demonstrate the effectiveness of DSTH in cross-modal retrieval, achieving absolute mAP improvements of 1.2%-9.3% over strong baselines while consistently maintaining superior performance on PR curves. The source code is available at: https://github.com/QinLab-WFU/DSTH-SoftTriple.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"108713"},"PeriodicalIF":6.3,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146221667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-10DOI: 10.1016/j.neunet.2026.108698
Jiashuang Huang, Zhan Su, Shu Jiang, Tao Hou, Mingliang Wang, Weiping Ding
Multi-atlas brain networks offer a more comprehensive and intricate understanding than a single atlas in identifying brain disorders. Traditional multi-atlas analysis methods depend on some simple fusion methods (i.e., addition and concatenation) but do not consider the information redundancy caused by increased brain regions and uncertain information between multiple atlases. To address this, we propose an effective multi-atlas brain network analysis method with a sparse and uncertain mechanism, called Sparse and Uncertain Fusion Neural Network (SUFNN). We first construct a multi-atlas brain network based on functional magnetic resonance imaging (fMRI) using different atlases. Then, an attention-enhanced module is used to learn the features of each atlas. These features are fed into the multi-atlas brain regions selection module, which can select disease-related brain regions based on their importance scores. Subsequently, the model employs the selected features for downstream processing. Finally, we employ an uncertain fusion module that determines the uncertainty of each atlas and performs an uncertain fusion strategy to get the results at the evidence level. Experimental results on the SRPBS dataset demonstrate that our SUFNN outperforms several state-of-the-art methods in identifying brain disorders.
{"title":"Select then fusion: An effective multi-atlas brain network analysis method with sparse and uncertain mechanism.","authors":"Jiashuang Huang, Zhan Su, Shu Jiang, Tao Hou, Mingliang Wang, Weiping Ding","doi":"10.1016/j.neunet.2026.108698","DOIUrl":"https://doi.org/10.1016/j.neunet.2026.108698","url":null,"abstract":"<p><p>Multi-atlas brain networks offer a more comprehensive and intricate understanding than a single atlas in identifying brain disorders. Traditional multi-atlas analysis methods depend on some simple fusion methods (i.e., addition and concatenation) but do not consider the information redundancy caused by increased brain regions and uncertain information between multiple atlases. To address this, we propose an effective multi-atlas brain network analysis method with a sparse and uncertain mechanism, called Sparse and Uncertain Fusion Neural Network (SUFNN). We first construct a multi-atlas brain network based on functional magnetic resonance imaging (fMRI) using different atlases. Then, an attention-enhanced module is used to learn the features of each atlas. These features are fed into the multi-atlas brain regions selection module, which can select disease-related brain regions based on their importance scores. Subsequently, the model employs the selected features for downstream processing. Finally, we employ an uncertain fusion module that determines the uncertainty of each atlas and performs an uncertain fusion strategy to get the results at the evidence level. Experimental results on the SRPBS dataset demonstrate that our SUFNN outperforms several state-of-the-art methods in identifying brain disorders.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"108698"},"PeriodicalIF":6.3,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146229340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-10DOI: 10.1016/j.neunet.2026.108695
Yanling Du, Ziliang Yang, Baozeng Chang, Jingxia Gao, Xiaojia Bao, Wei Song
Existing unsupervised anomaly detection methods for multivariate time series(MTS) have demonstrated advanced performance on numerous public datasets. However, these methods exhibit two critical limitations: (1) the assumption of completely noise-free training data contradicts real-world conditions where normal samples are inevitably contaminated; (2) the inherent non-stationarity of MTS induces distribution shift, leading to biased learning and degraded generalization capabilities. This paper proposes NORDA, a novel MTS anomaly detection framework that integrates a multi-order difference mechanism with distribution shift optimization. Firstly, a multi-order difference mechanism performs multi-order explicit differencing on raw temporal signals, effectively mitigating noise interference during representation learning. Secondly, a mixed reversible normalization module is proposed, employing a normalization network with multiple statistical features to dynamically model non-stationary variations across variables. This module achieves remove and restore of non-stationary properties of MTS through a symmetric reversible architecture, thereby enhancing the dynamic adaptability of model to distribution shift. By synergistically integrating the two aforementioned modules with a Transformer-based multi-layer encoder, this framework can extract robust latent representations through modeling of inter-channel dependencies in differentially processed data streams. Extensive experiments on seven benchmark datasets demonstrate that NORDA significantly outperforms sixteen typical baseline methods while exhibiting great robustness against noise contamination.
{"title":"A noise robust and distribution-adaptive framework for multivariate time series anomaly detection.","authors":"Yanling Du, Ziliang Yang, Baozeng Chang, Jingxia Gao, Xiaojia Bao, Wei Song","doi":"10.1016/j.neunet.2026.108695","DOIUrl":"https://doi.org/10.1016/j.neunet.2026.108695","url":null,"abstract":"<p><p>Existing unsupervised anomaly detection methods for multivariate time series(MTS) have demonstrated advanced performance on numerous public datasets. However, these methods exhibit two critical limitations: (1) the assumption of completely noise-free training data contradicts real-world conditions where normal samples are inevitably contaminated; (2) the inherent non-stationarity of MTS induces distribution shift, leading to biased learning and degraded generalization capabilities. This paper proposes NORDA, a novel MTS anomaly detection framework that integrates a multi-order difference mechanism with distribution shift optimization. Firstly, a multi-order difference mechanism performs multi-order explicit differencing on raw temporal signals, effectively mitigating noise interference during representation learning. Secondly, a mixed reversible normalization module is proposed, employing a normalization network with multiple statistical features to dynamically model non-stationary variations across variables. This module achieves remove and restore of non-stationary properties of MTS through a symmetric reversible architecture, thereby enhancing the dynamic adaptability of model to distribution shift. By synergistically integrating the two aforementioned modules with a Transformer-based multi-layer encoder, this framework can extract robust latent representations through modeling of inter-channel dependencies in differentially processed data streams. Extensive experiments on seven benchmark datasets demonstrate that NORDA significantly outperforms sixteen typical baseline methods while exhibiting great robustness against noise contamination.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"108695"},"PeriodicalIF":6.3,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146214761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-10DOI: 10.1016/j.neunet.2026.108710
Hongchi Chen, Jifei Tang, Lanhua Xia, Yuanjin Bao
Long-term time series forecasting (LTSF) is critical to industrial applications. While recent advances mainly focus on modeling complex multivariate interactions, the practical benefits may only be marginal by challenges such as asynchronous data drifts, noise interference, and abrupt changes. This study demonstrates a well-designed univariate modeling can be more effective and efficient. The univariate model with Power Decomposition and online Post-Calibration (PDCNet) is proposed, which incorporates two novel mechanisms. 1) Power decomposition (P3D) is designed to disentangle time series based on data power distribution, which significantly enhancing data predictability and mitigating the obscuring effect from dominant periodicities. Predictability-ACF joint analysis is introduced to determine optimal decomposition thresholds. 2) By designing the abrupt factor M to classify the data morphological changes and historical performance-based correction dictionary, a lightweight online Post-Calibration is proposed to adapt to pattern drifts without retraining the main model. Comprehensive experiments show that PDCNet consistently outperforms state-of-the-art models on univariate tasks. Through simple aggregation, it also achieves top-tier multivariate performance. P3D brings an average 15% improvement in 68.57% of cases, while calibration further improves accuracy by 16% in calibrated regions. Notably, PDCNet reduces GPU memory usage which can reach up to 95% compared to multivariate counterparts in extreme-scale tasks. Our work proves that capturing internal temporal dependencies within each variable is a more efficient and practical design for LTSF. PDCNet can serve as a competitive baseline, offering superior performance with significantly reduced memory footprint. The related source code and configuration files can be accessed at https://github.com/hongchichen/PDCNet.
{"title":"Rethinking multivariate modeling in long-term forecasting: an efficient univariate framework with power decomposition and post-Calibration.","authors":"Hongchi Chen, Jifei Tang, Lanhua Xia, Yuanjin Bao","doi":"10.1016/j.neunet.2026.108710","DOIUrl":"https://doi.org/10.1016/j.neunet.2026.108710","url":null,"abstract":"<p><p>Long-term time series forecasting (LTSF) is critical to industrial applications. While recent advances mainly focus on modeling complex multivariate interactions, the practical benefits may only be marginal by challenges such as asynchronous data drifts, noise interference, and abrupt changes. This study demonstrates a well-designed univariate modeling can be more effective and efficient. The univariate model with Power Decomposition and online Post-Calibration (PDCNet) is proposed, which incorporates two novel mechanisms. 1) Power decomposition (P3D) is designed to disentangle time series based on data power distribution, which significantly enhancing data predictability and mitigating the obscuring effect from dominant periodicities. Predictability-ACF joint analysis is introduced to determine optimal decomposition thresholds. 2) By designing the abrupt factor M to classify the data morphological changes and historical performance-based correction dictionary, a lightweight online Post-Calibration is proposed to adapt to pattern drifts without retraining the main model. Comprehensive experiments show that PDCNet consistently outperforms state-of-the-art models on univariate tasks. Through simple aggregation, it also achieves top-tier multivariate performance. P3D brings an average 15% improvement in 68.57% of cases, while calibration further improves accuracy by 16% in calibrated regions. Notably, PDCNet reduces GPU memory usage which can reach up to 95% compared to multivariate counterparts in extreme-scale tasks. Our work proves that capturing internal temporal dependencies within each variable is a more efficient and practical design for LTSF. PDCNet can serve as a competitive baseline, offering superior performance with significantly reduced memory footprint. The related source code and configuration files can be accessed at https://github.com/hongchichen/PDCNet.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"108710"},"PeriodicalIF":6.3,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146221731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-view graph clustering, a fundamental task in data mining and machine learning, aims to partition nodes into disjoint groups by leveraging complementary information from multiple data sources. Although significant progress has been made, existing methods often struggle to effectively capture both the unique structural information within each view and the complementary relationships across different views. Moreover, the lack of mechanisms to enforce global semantic consistency frequently results in unstable consensus representations and degraded clustering quality. To address these issues, we propose a novel end-to-end method, Multi-view Graph Clustering via Dual attention fusion and Collaborative optimization (MGCDC). Specifically, each view is first encoded using a graph attention autoencoder to obtain view-specific node embeddings. These embeddings are then integrated via a view-level attention mechanism to generate a unified consensus representation. To guide the learning process, we introduce two collaborative optimization objectives. First, a cross-view cluster alignment loss is employed to jointly perform self-training learning on both the view-specific and consensus embeddings. Second, a semantic consistency enhancement loss is introduced to maximize mutual information between node embeddings and their corresponding cluster summaries. The entire model is optimized end-to-end by jointly learning node representations, integrating multi-view information, and refining cluster assignments. Extensive experiments on five benchmark datasets demonstrate that MGCDC achieves highly competitive performance compared to state-of-the-art methods.
{"title":"Multi-view graph clustering via dual attention fusion and collaborative optimization.","authors":"Zuowei Wang, Sen Xu, Naixuan Guo, Xuesheng Bian, Xiufang Xu, Shanliang Yao, Xianye Ben, Tian Zhou","doi":"10.1016/j.neunet.2026.108704","DOIUrl":"https://doi.org/10.1016/j.neunet.2026.108704","url":null,"abstract":"<p><p>Multi-view graph clustering, a fundamental task in data mining and machine learning, aims to partition nodes into disjoint groups by leveraging complementary information from multiple data sources. Although significant progress has been made, existing methods often struggle to effectively capture both the unique structural information within each view and the complementary relationships across different views. Moreover, the lack of mechanisms to enforce global semantic consistency frequently results in unstable consensus representations and degraded clustering quality. To address these issues, we propose a novel end-to-end method, Multi-view Graph Clustering via Dual attention fusion and Collaborative optimization (MGCDC). Specifically, each view is first encoded using a graph attention autoencoder to obtain view-specific node embeddings. These embeddings are then integrated via a view-level attention mechanism to generate a unified consensus representation. To guide the learning process, we introduce two collaborative optimization objectives. First, a cross-view cluster alignment loss is employed to jointly perform self-training learning on both the view-specific and consensus embeddings. Second, a semantic consistency enhancement loss is introduced to maximize mutual information between node embeddings and their corresponding cluster summaries. The entire model is optimized end-to-end by jointly learning node representations, integrating multi-view information, and refining cluster assignments. Extensive experiments on five benchmark datasets demonstrate that MGCDC achieves highly competitive performance compared to state-of-the-art methods.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"108704"},"PeriodicalIF":6.3,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146203511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}