首页 > 最新文献

Neural Networks最新文献

英文 中文
A unified framework for sequential recommendation with gated differential amplified attention and repetition-exploration intent modeling. 基于门控差分放大注意和重复探索意图建模的顺序推荐统一框架。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-11 DOI: 10.1016/j.neunet.2026.108721
Jinzhao Su, Shiyu Liu, Shunzhi Yang, Chang-Dong Wang, Shengli Sun, Zhenhua Huang

Self-attention models in sequential recommendation face two under-explored but complementary challenges: (1) susceptibility to attention noise, especially in long-sequence modeling, and (2) difficulty in distinguishing between repetition and exploration behaviors under the Softmax bottleneck. To jointly tackle these challenges, we propose a unified framework with gated differential amplified attention and repetition-exploration intent modeling (GDA-REIM for short). Existing methods generally address these problems individually, via external denoising or architecture-level branching for repetition-exploration modeling. However, noise in attention weights can obscure true user intent (repetition vs. exploration), while clear intent boundaries can guide more effective denoising, making joint optimization essential yet unexplored in prior work. GDA-REIM incorporates a gated differential amplified attention (GDAA) module, which employs a three-stage "differentiation-gating-amplification" pipeline that computes and subtracts paired attention maps to suppress common-mode noise and dynamically rescales the denoised signal. Leveraging the resulting denoised representations, a partitioned intent scoring (PIS) component together with an intent discrimination margin (IDM) loss explicitly distinguishes repetition and exploration intent. Extensive experiments on ML-1M, Amazon-Video-Games, and Twitch-100k datasets demonstrate consistent improvements over strong baselines (e.g., approximately +10% improvement in NDCG@10, or N@10 for short, on ML-1M). Our code is released at https://anonymous.4open.science/r/GDA-REIM/.

序列推荐中的自注意模型面临着两个尚未得到充分开发但互补的挑战:(1)易受注意噪声的影响,特别是在长序列建模中;(2)在Softmax瓶颈下难以区分重复行为和探索行为。为了共同应对这些挑战,我们提出了一个统一的门控差分放大注意和重复探索意图建模框架(简称GDA-REIM)。现有方法通常通过外部去噪或架构级分支进行重复探索建模来单独解决这些问题。然而,注意权重中的噪声可能会模糊真正的用户意图(重复与探索),而明确的意图边界可以指导更有效的去噪,使联合优化变得必不可少,但在之前的工作中尚未探索。GDA-REIM集成了一个门控差分放大注意(GDAA)模块,该模块采用三级“差分门控放大”管道,计算和减去对的注意图,以抑制共模噪声,并动态地重新缩放去噪信号。利用所得到的去噪表示,分区意图评分(PIS)组件和意图区分裕度(IDM)损失明确区分重复和探索意图。在ML-1M、Amazon-Video-Games和Twitch-100k数据集上进行的大量实验表明,在强大的基线上有一致的改进(例如,在NDCG@10或N@10上大约有+10%的改进)。我们的代码发布在https://anonymous.4open.science/r/GDA-REIM/。
{"title":"A unified framework for sequential recommendation with gated differential amplified attention and repetition-exploration intent modeling.","authors":"Jinzhao Su, Shiyu Liu, Shunzhi Yang, Chang-Dong Wang, Shengli Sun, Zhenhua Huang","doi":"10.1016/j.neunet.2026.108721","DOIUrl":"https://doi.org/10.1016/j.neunet.2026.108721","url":null,"abstract":"<p><p>Self-attention models in sequential recommendation face two under-explored but complementary challenges: (1) susceptibility to attention noise, especially in long-sequence modeling, and (2) difficulty in distinguishing between repetition and exploration behaviors under the Softmax bottleneck. To jointly tackle these challenges, we propose a unified framework with gated differential amplified attention and repetition-exploration intent modeling (GDA-REIM for short). Existing methods generally address these problems individually, via external denoising or architecture-level branching for repetition-exploration modeling. However, noise in attention weights can obscure true user intent (repetition vs. exploration), while clear intent boundaries can guide more effective denoising, making joint optimization essential yet unexplored in prior work. GDA-REIM incorporates a gated differential amplified attention (GDAA) module, which employs a three-stage \"differentiation-gating-amplification\" pipeline that computes and subtracts paired attention maps to suppress common-mode noise and dynamically rescales the denoised signal. Leveraging the resulting denoised representations, a partitioned intent scoring (PIS) component together with an intent discrimination margin (IDM) loss explicitly distinguishes repetition and exploration intent. Extensive experiments on ML-1M, Amazon-Video-Games, and Twitch-100k datasets demonstrate consistent improvements over strong baselines (e.g., approximately +10% improvement in NDCG@10, or N@10 for short, on ML-1M). Our code is released at https://anonymous.4open.science/r/GDA-REIM/.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"108721"},"PeriodicalIF":6.3,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146229159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SD2-ReID: A semantic-stylistic decoupled distillation framework for robust multi-modal object re-identification. SD2-ReID:用于鲁棒多模态对象再识别的语义风格解耦蒸馏框架。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-11 DOI: 10.1016/j.neunet.2026.108719
Yonghao Yan, Meijing Gao, Yang Bai, Xu Chen, Bingzhou Sun, Huanyu Sun, Sibo Chen

The core challenge of multi-modal object re-identification (ReID) lies in reconciling the style discrepancies across different modalities with the semantic consistency of identity. However, existing methods are difficult to effectively separate semantic features from modality-specific styles, resulting in semantic representations being contaminated by noise and affecting recognition performance. To address the above issues, we propose a multi-modal re-identification framework based on semantic-stylistic decoupled distillation, named SD2-ReID (Semantic-Stylistic Decoupled Distillation for ReID), aiming to improve modal consistency and cross-modal semantic discrimination. Firstly, we design a Hybrid Multi-modal Feature Extractor (HMFE) that employs a shared shallow structure and modality-specific deep branches to achieve fine-grained feature extraction, thereby improving learning efficiency while preserving modality-specific characteristics; secondly, we design a Decoupled Distillation Module (DDM) that explicitly separates semantic and stylistic features through dual constraints of semantic and style distillation, improving cross-modal semantic consistency and discriminative ability; finally, we propose an attention-guided masking strategy and integrate intra-modal and cross-modal contrastive learning to construct a Hierarchical Self-supervised Learning Module (HSLM), thereby enhancing the model's robustness to local occlusions and style variations.The synergistic enhancement of semantic consistency, modal invariance and feature robustness is finally realized. Unlike existing methods, SD2-ReID does not require the design of a multi-modal fusion module and does not introduce additional overhead in the inference phase, while balancing recognition performance and inference efficiency. Experiments on three multi-modal object ReID benchmark test sets fully validate the effectiveness of our method.

多模态对象再识别的核心挑战在于如何协调不同模态之间的风格差异和同一性的语义一致性。然而,现有的方法难以有效地将语义特征与特定于模态的样式分离开来,导致语义表示受到噪声的污染,影响识别性能。针对上述问题,本文提出了一种基于语义文体解耦蒸馏的多模态再识别框架,命名为SD2-ReID (semantic-stylistic decoupling distillation for ReID),旨在提高模态一致性和跨模态语义识别能力。首先,我们设计了一种混合多模态特征提取器(HMFE),利用共享的浅结构和模态特定的深分支来实现细粒度特征提取,从而在保持模态特定特征的同时提高了学习效率;其次,设计了解耦蒸馏模块(DDM),通过语义和风格蒸馏的双重约束明确分离语义和风格特征,提高了跨模态语义一致性和判别能力;最后,我们提出了一种注意引导掩蔽策略,并结合模态内和跨模态对比学习构建了层次自监督学习模块(HSLM),从而增强了模型对局部闭塞和风格变化的鲁棒性。最终实现了语义一致性、模态不变性和特征鲁棒性的协同增强。与现有方法不同,SD2-ReID不需要设计多模态融合模块,也不会在推理阶段引入额外的开销,同时平衡了识别性能和推理效率。在三个多模态目标ReID基准测试集上的实验充分验证了该方法的有效性。
{"title":"SD<sup>2</sup>-ReID: A semantic-stylistic decoupled distillation framework for robust multi-modal object re-identification.","authors":"Yonghao Yan, Meijing Gao, Yang Bai, Xu Chen, Bingzhou Sun, Huanyu Sun, Sibo Chen","doi":"10.1016/j.neunet.2026.108719","DOIUrl":"https://doi.org/10.1016/j.neunet.2026.108719","url":null,"abstract":"<p><p>The core challenge of multi-modal object re-identification (ReID) lies in reconciling the style discrepancies across different modalities with the semantic consistency of identity. However, existing methods are difficult to effectively separate semantic features from modality-specific styles, resulting in semantic representations being contaminated by noise and affecting recognition performance. To address the above issues, we propose a multi-modal re-identification framework based on semantic-stylistic decoupled distillation, named SD<sup>2</sup>-ReID (Semantic-Stylistic Decoupled Distillation for ReID), aiming to improve modal consistency and cross-modal semantic discrimination. Firstly, we design a Hybrid Multi-modal Feature Extractor (HMFE) that employs a shared shallow structure and modality-specific deep branches to achieve fine-grained feature extraction, thereby improving learning efficiency while preserving modality-specific characteristics; secondly, we design a Decoupled Distillation Module (DDM) that explicitly separates semantic and stylistic features through dual constraints of semantic and style distillation, improving cross-modal semantic consistency and discriminative ability; finally, we propose an attention-guided masking strategy and integrate intra-modal and cross-modal contrastive learning to construct a Hierarchical Self-supervised Learning Module (HSLM), thereby enhancing the model's robustness to local occlusions and style variations.The synergistic enhancement of semantic consistency, modal invariance and feature robustness is finally realized. Unlike existing methods, SD<sup>2</sup>-ReID does not require the design of a multi-modal fusion module and does not introduce additional overhead in the inference phase, while balancing recognition performance and inference efficiency. Experiments on three multi-modal object ReID benchmark test sets fully validate the effectiveness of our method.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"108719"},"PeriodicalIF":6.3,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146203526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transforming tabular data into images for deep learning models 将表格数据转换为深度学习模型的图像
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-10 DOI: 10.1016/j.neunet.2026.108715
Abdullah Elen , Emre Avuçlu
Deep learning (DL) has achieved remarkable success in processing unstructured data such as images, text, and audio, yet its application to tabular numerical datasets remains challenging due to the lack of inherent spatial structure. In this study, we present a novel approach for transforming numerical tabular data into grayscale image representations, enabling the effective use of convolutional neural networks and other DL architectures on traditionally numerical datasets. The method normalizes features, organizes them into square image matrices, and generates labeled images for classification. Experiments were conducted on four publicly available datasets: Rice MSC Dataset (RMSCD), Optical Recognition of Handwritten Digits (Optdigits), TUNADROMD, and Spambase. Transformed datasets were evaluated using Residual Network (ResNet-18) and Directed Acyclic Graph Neural Network (DAG-Net) models with 5-fold cross-validation. The DAG-Net model achieved accuracies of 99.91% on RMSCD, 99.77% on Optdigits, 98.84% on TUNADROMD, and 93.06% on Spambase, demonstrating the efficacy of the proposed transformation. Additional ablation studies and efficiency analyses highlight improvements in training performance and computational cost. The results indicate that the proposed image-based transformation provides a practical and efficient strategy for integrating numerical datasets into deep learning workflows, broadening the applicability of DL techniques across diverse domains. The implementation is released as open-source software to facilitate reproducibility and further research.
深度学习(DL)在处理非结构化数据(如图像、文本和音频)方面取得了显著的成功,但由于缺乏固有的空间结构,将其应用于表格数字数据集仍然具有挑战性。在本研究中,我们提出了一种将数值表格数据转换为灰度图像表示的新方法,从而能够在传统的数值数据集上有效地使用卷积神经网络和其他深度学习架构。该方法将特征归一化,组织成方形图像矩阵,生成标记图像进行分类。实验在四个公开的数据集上进行:Rice MSC Dataset (RMSCD)、Optical Recognition of handwriting Digits (Optdigits)、TUNADROMD和Spambase。转换后的数据集使用残差网络(ResNet-18)和有向无环图神经网络(DAG-Net)模型进行评估,并进行5次交叉验证。DAG-Net模型在RMSCD上的准确率为99.91%,在Optdigits上的准确率为99.77%,在TUNADROMD上的准确率为98.84%,在Spambase上的准确率为93.06%,证明了所提出转换的有效性。额外的消融研究和效率分析强调了训练性能和计算成本的改进。结果表明,所提出的基于图像的转换为将数值数据集集成到深度学习工作流中提供了一种实用而有效的策略,扩大了深度学习技术在不同领域的适用性。该实现作为开源软件发布,以促进可重复性和进一步的研究。
{"title":"Transforming tabular data into images for deep learning models","authors":"Abdullah Elen ,&nbsp;Emre Avuçlu","doi":"10.1016/j.neunet.2026.108715","DOIUrl":"10.1016/j.neunet.2026.108715","url":null,"abstract":"<div><div>Deep learning (DL) has achieved remarkable success in processing unstructured data such as images, text, and audio, yet its application to tabular numerical datasets remains challenging due to the lack of inherent spatial structure. In this study, we present a novel approach for transforming numerical tabular data into grayscale image representations, enabling the effective use of convolutional neural networks and other DL architectures on traditionally numerical datasets. The method normalizes features, organizes them into square image matrices, and generates labeled images for classification. Experiments were conducted on four publicly available datasets: Rice MSC Dataset (RMSCD), Optical Recognition of Handwritten Digits (Optdigits), TUNADROMD, and Spambase. Transformed datasets were evaluated using Residual Network (ResNet-18) and Directed Acyclic Graph Neural Network (DAG-Net) models with 5-fold cross-validation. The DAG-Net model achieved accuracies of 99.91% on RMSCD, 99.77% on Optdigits, 98.84% on TUNADROMD, and 93.06% on Spambase, demonstrating the efficacy of the proposed transformation. Additional ablation studies and efficiency analyses highlight improvements in training performance and computational cost. The results indicate that the proposed image-based transformation provides a practical and efficient strategy for integrating numerical datasets into deep learning workflows, broadening the applicability of DL techniques across diverse domains. The implementation is released as open-source software to facilitate reproducibility and further research.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108715"},"PeriodicalIF":6.3,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trainable-parameter-free structural-diversity message passing for graph neural networks 图神经网络的无可训练参数结构分集消息传递
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-10 DOI: 10.1016/j.neunet.2026.108711
Mingyue Kong, Yinglong Zhang, Chengda Xu, Xuewen Xia, Xing Xu
Graph Neural Networks (GNNs) have achieved strong performance in structured data modeling such as node classification. However, real-world graphs often exhibit heterogeneous neighborhoods and complex feature distributions, while mainstream approaches rely on many learnable parameters and apply uniform aggregation to all neighbors. This lack of explicit modeling for structural diversity often leads to representation homogenization, semantic degradation, and poor adaptability under challenging conditions such as low supervision or class imbalance. To address these limitations, we propose a trainable-parameter-free graph neural network framework, termed the Structural-Diversity Graph Neural Network (SDGNN), which operationalizes structural diversity in message passing. At its core, the Structural-Diversity Message Passing (SDMP) mechanism performs within-group statistics followed by cross-group selection, thereby capturing neighborhood heterogeneity while stabilizing feature semantics. SDGNN further incorporates complementary structure-driven and feature-driven partitioning strategies, together with a normalized-propagation-based global structural enhancer, to enhance adaptability across diverse graphs. Extensive experiments on nine public benchmark datasets and an interdisciplinary PubMed citation network demonstrate that SDGNN consistently outperforms mainstream GNNs, especially under low supervision, class imbalance, and cross-domain transfer. The full implementation, including code and configurations, is publicly available at: https://github.com/mingyue15694/SGDNN/tree/main.
图神经网络(gnn)在节点分类等结构化数据建模方面取得了优异的成绩。然而,现实世界的图经常表现出异构邻域和复杂的特征分布,而主流方法依赖于许多可学习的参数,并对所有邻域应用统一聚合。缺乏对结构多样性的显式建模通常会导致表征同质化、语义退化以及在低监督或类不平衡等具有挑战性的条件下的适应性差。为了解决这些限制,我们提出了一个无训练参数的图神经网络框架,称为结构多样性图神经网络(SDGNN),它在消息传递中实现结构多样性。在其核心,结构多样性消息传递(SDMP)机制执行组内统计,然后进行跨组选择,从而在稳定特征语义的同时捕获邻居异质性。SDGNN进一步融合了互补的结构驱动和特征驱动划分策略,以及基于归一化传播的全局结构增强器,以增强不同图的适应性。在9个公共基准数据集和一个跨学科的PubMed引文网络上进行的大量实验表明,SDGNN的性能始终优于主流gnn,特别是在低监督、类不平衡和跨领域迁移的情况下。完整的实现,包括代码和配置,可以在:https://github.com/mingyue15694/SGDNN/tree/main上公开获得。
{"title":"Trainable-parameter-free structural-diversity message passing for graph neural networks","authors":"Mingyue Kong,&nbsp;Yinglong Zhang,&nbsp;Chengda Xu,&nbsp;Xuewen Xia,&nbsp;Xing Xu","doi":"10.1016/j.neunet.2026.108711","DOIUrl":"10.1016/j.neunet.2026.108711","url":null,"abstract":"<div><div>Graph Neural Networks (GNNs) have achieved strong performance in structured data modeling such as node classification. However, real-world graphs often exhibit heterogeneous neighborhoods and complex feature distributions, while mainstream approaches rely on many learnable parameters and apply uniform aggregation to all neighbors. This lack of explicit modeling for structural diversity often leads to representation homogenization, semantic degradation, and poor adaptability under challenging conditions such as low supervision or class imbalance. To address these limitations, we propose a trainable-parameter-free graph neural network framework, termed the Structural-Diversity Graph Neural Network (SDGNN), which operationalizes structural diversity in message passing. At its core, the Structural-Diversity Message Passing (SDMP) mechanism performs within-group statistics followed by cross-group selection, thereby capturing neighborhood heterogeneity while stabilizing feature semantics. SDGNN further incorporates complementary structure-driven and feature-driven partitioning strategies, together with a normalized-propagation-based global structural enhancer, to enhance adaptability across diverse graphs. Extensive experiments on nine public benchmark datasets and an interdisciplinary PubMed citation network demonstrate that SDGNN consistently outperforms mainstream GNNs, especially under low supervision, class imbalance, and cross-domain transfer. The full implementation, including code and configurations, is publicly available at: <span><span>https://github.com/mingyue15694/SGDNN/tree/main</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108711"},"PeriodicalIF":6.3,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TransUTD: Underwater cross-domain collaborative spatial-temporal transformer detector. TransUTD:水下跨域协同时空变压器探测器。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-10 DOI: 10.1016/j.neunet.2026.108705
Bingxun Zhao, Xiao Han, Ruihao Sui, Yuan Chen

Research on underwater object detection has primarily focused on addressing degraded imagery. Single-frame feature refinement is inherently limited by restricted static spatial information, while joint image enhancement and detection paradigms encounter non-trivial challenges arising from irreversible artifacts and conflicting optimization objectives. In contrast, temporal information from video sequences offers a direct solution. Temporal semantic information enhances the feature representation of degraded underwater frames, whereas temporal positional cues furnish dynamic geometric associations that facilitate precise object localization. We propose Transformer Underwater Spatial-Temporal Cross-domain Collaborative Detection (TransUTD), reformulating underwater degraded feature representation as a temporal contextual modeling problem. By synergistic exploitation of spatial-temporal information, TransUTD naturally learns complementary features across frames to compensate for feature degradation in key frames, rather than relying on specific heuristic components. This simplifies the detection pipeline and eliminates hand-crafted modules. In our framework, the spatial-temporal fusion encoder aggregates multi-frame features to strengthen semantic representations in degraded images. The spatial-temporal query interaction refines localization in complex underwater scenes by correlating spatial-temporal geometric cues. Finally, the temporal hybrid collaborative decoder performs dense supervision through collaborative optimization of temporal positive queries. Concurrently, we construct UVID, the first underwater video object detection dataset. Experimental evaluations demonstrate that TransUTD achieves state-of-the-art performance, delivering AP improvements of 1.5% and 1.9% on the DUO and UVID datasets, respectively. Moreover, it attains near SOTA performance on ImageNetVID with AP50 of 86.0%. Our dataset and code are available at https://github.com/Anchor1566/TransUTD.

水下目标检测的研究主要集中在处理退化图像上。单帧特征优化受到静态空间信息的限制,而联合图像增强和检测范式则面临着不可逆伪影和优化目标冲突带来的重大挑战。相比之下,来自视频序列的时间信息提供了一个直接的解决方案。时间语义信息增强了退化水下帧的特征表示,而时间位置线索提供了动态几何关联,从而促进了精确的目标定位。我们提出了变形水下时空跨域协同检测(TransUTD),将水下退化特征表示重新表述为一个时间上下文建模问题。通过对时空信息的协同利用,TransUTD自然地学习跨帧的互补特征,以补偿关键帧中的特征退化,而不是依赖于特定的启发式组件。这简化了检测管道并消除了手工制作的模块。在我们的框架中,时空融合编码器聚合了多帧特征,以增强降级图像中的语义表示。时空查询交互通过关联时空几何线索来细化复杂水下场景的定位。最后,时间混合协同解码器通过时间正查询的协同优化进行密集监督。同时,我们构建了首个水下视频目标检测数据集UVID。实验评估表明,TransUTD达到了最先进的性能,在DUO和UVID数据集上的AP分别提高了1.5%和1.9%。此外,它在ImageNetVID上达到接近SOTA的性能,AP50为86.0%。我们的数据集和代码可在https://github.com/Anchor1566/TransUTD上获得。
{"title":"TransUTD: Underwater cross-domain collaborative spatial-temporal transformer detector.","authors":"Bingxun Zhao, Xiao Han, Ruihao Sui, Yuan Chen","doi":"10.1016/j.neunet.2026.108705","DOIUrl":"https://doi.org/10.1016/j.neunet.2026.108705","url":null,"abstract":"<p><p>Research on underwater object detection has primarily focused on addressing degraded imagery. Single-frame feature refinement is inherently limited by restricted static spatial information, while joint image enhancement and detection paradigms encounter non-trivial challenges arising from irreversible artifacts and conflicting optimization objectives. In contrast, temporal information from video sequences offers a direct solution. Temporal semantic information enhances the feature representation of degraded underwater frames, whereas temporal positional cues furnish dynamic geometric associations that facilitate precise object localization. We propose Transformer Underwater Spatial-Temporal Cross-domain Collaborative Detection (TransUTD), reformulating underwater degraded feature representation as a temporal contextual modeling problem. By synergistic exploitation of spatial-temporal information, TransUTD naturally learns complementary features across frames to compensate for feature degradation in key frames, rather than relying on specific heuristic components. This simplifies the detection pipeline and eliminates hand-crafted modules. In our framework, the spatial-temporal fusion encoder aggregates multi-frame features to strengthen semantic representations in degraded images. The spatial-temporal query interaction refines localization in complex underwater scenes by correlating spatial-temporal geometric cues. Finally, the temporal hybrid collaborative decoder performs dense supervision through collaborative optimization of temporal positive queries. Concurrently, we construct UVID, the first underwater video object detection dataset. Experimental evaluations demonstrate that TransUTD achieves state-of-the-art performance, delivering AP improvements of 1.5% and 1.9% on the DUO and UVID datasets, respectively. Moreover, it attains near SOTA performance on ImageNetVID with AP<sub>50</sub> of 86.0%. Our dataset and code are available at https://github.com/Anchor1566/TransUTD.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"108705"},"PeriodicalIF":6.3,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146214778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Softtriple hashing for Multi-Label cross-modal retrieval. 深度软三重哈希多标签跨模态检索。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-10 DOI: 10.1016/j.neunet.2026.108713
Shuo Han, Qibing Qin, Jinkui Hou, Wenfeng Zhang, Lei Huang

Hashing techniques are widely adopted in large-scale retrieval due to their low time and space complexity. Existing deep cross-modal hashing methods mostly rely on mini-batch training, where only a limited number of samples are processed in each iteration, often resulting in incomplete neighborhood exploration and sub-optimal embedding learning, especially on complex multi-label datasets. To address this limitation, recent studies have optimized the SoftMax loss, which is essentially equivalent to a smoothed triplet constraint with a single center assigned to each class. However, in practical retrieval scenarios, class distributions in the embedding space often contain multiple semantic clusters. Modeling each class with only one center fails to capture these intra-class local structures, thereby widening the semantic gap between heterogeneous samples. To alleviate this issue, we propose a novel deep cross-modal hashing framework, Deep SoftTriple Hashing (DSTH), which learns compact hash codes to better preserve semantic similarities in the embedding space. The framework introduces multiple centers for each class to effectively model the implicit distribution of heterogeneous samples and reduce intra-class semantic variance. To determine the number of centers, a class-center strategy is developed, where similar centers are encouraged to aggregate through an L2,1 regularization to obtain a compact set of centers. In addition, a semantic position quantization loss is introduced to minimize quantization error and enhance the discriminability of binary codes. Extensive experiments on three multi-label datasets demonstrate the effectiveness of DSTH in cross-modal retrieval, achieving absolute mAP improvements of 1.2%-9.3% over strong baselines while consistently maintaining superior performance on PR curves. The source code is available at: https://github.com/QinLab-WFU/DSTH-SoftTriple.

哈希技术由于其低的时间和空间复杂度,在大规模检索中被广泛采用。现有的深度跨模态哈希方法大多依赖于小批量训练,每次迭代只处理有限数量的样本,经常导致邻域探索不完整和次优嵌入学习,特别是在复杂的多标签数据集上。为了解决这一限制,最近的研究优化了SoftMax损失,它本质上相当于一个平滑的三重约束,每个类分配一个单一的中心。然而,在实际的检索场景中,嵌入空间中的类分布往往包含多个语义聚类。仅用一个中心对每个类建模无法捕获这些类内的局部结构,从而扩大了异构样本之间的语义差距。为了缓解这一问题,我们提出了一种新的深度跨模态哈希框架——深度软三重哈希(deep SoftTriple hashing, DSTH),它学习紧凑哈希码,以更好地保持嵌入空间中的语义相似性。该框架为每个类引入多个中心,有效地模拟异构样本的隐式分布,减少类内语义差异。为了确定中心的数量,我们开发了一种类中心策略,其中通过L2,1正则化鼓励相似的中心聚集以获得一个紧凑的中心集。此外,引入语义位置量化损失来减小量化误差,提高二进制码的可分辨性。在三个多标签数据集上的大量实验证明了DSTH在跨模态检索中的有效性,在强基线上实现了1.2%-9.3%的绝对mAP改进,同时在PR曲线上始终保持优越的性能。源代码可从https://github.com/QinLab-WFU/DSTH-SoftTriple获得。
{"title":"Deep Softtriple hashing for Multi-Label cross-modal retrieval.","authors":"Shuo Han, Qibing Qin, Jinkui Hou, Wenfeng Zhang, Lei Huang","doi":"10.1016/j.neunet.2026.108713","DOIUrl":"https://doi.org/10.1016/j.neunet.2026.108713","url":null,"abstract":"<p><p>Hashing techniques are widely adopted in large-scale retrieval due to their low time and space complexity. Existing deep cross-modal hashing methods mostly rely on mini-batch training, where only a limited number of samples are processed in each iteration, often resulting in incomplete neighborhood exploration and sub-optimal embedding learning, especially on complex multi-label datasets. To address this limitation, recent studies have optimized the SoftMax loss, which is essentially equivalent to a smoothed triplet constraint with a single center assigned to each class. However, in practical retrieval scenarios, class distributions in the embedding space often contain multiple semantic clusters. Modeling each class with only one center fails to capture these intra-class local structures, thereby widening the semantic gap between heterogeneous samples. To alleviate this issue, we propose a novel deep cross-modal hashing framework, Deep SoftTriple Hashing (DSTH), which learns compact hash codes to better preserve semantic similarities in the embedding space. The framework introduces multiple centers for each class to effectively model the implicit distribution of heterogeneous samples and reduce intra-class semantic variance. To determine the number of centers, a class-center strategy is developed, where similar centers are encouraged to aggregate through an L<sub>2,1</sub> regularization to obtain a compact set of centers. In addition, a semantic position quantization loss is introduced to minimize quantization error and enhance the discriminability of binary codes. Extensive experiments on three multi-label datasets demonstrate the effectiveness of DSTH in cross-modal retrieval, achieving absolute mAP improvements of 1.2%-9.3% over strong baselines while consistently maintaining superior performance on PR curves. The source code is available at: https://github.com/QinLab-WFU/DSTH-SoftTriple.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"108713"},"PeriodicalIF":6.3,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146221667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Select then fusion: An effective multi-atlas brain network analysis method with sparse and uncertain mechanism. 选择然后融合:一种有效的多图谱脑网络分析方法,具有稀疏和不确定的机制。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-10 DOI: 10.1016/j.neunet.2026.108698
Jiashuang Huang, Zhan Su, Shu Jiang, Tao Hou, Mingliang Wang, Weiping Ding

Multi-atlas brain networks offer a more comprehensive and intricate understanding than a single atlas in identifying brain disorders. Traditional multi-atlas analysis methods depend on some simple fusion methods (i.e., addition and concatenation) but do not consider the information redundancy caused by increased brain regions and uncertain information between multiple atlases. To address this, we propose an effective multi-atlas brain network analysis method with a sparse and uncertain mechanism, called Sparse and Uncertain Fusion Neural Network (SUFNN). We first construct a multi-atlas brain network based on functional magnetic resonance imaging (fMRI) using different atlases. Then, an attention-enhanced module is used to learn the features of each atlas. These features are fed into the multi-atlas brain regions selection module, which can select disease-related brain regions based on their importance scores. Subsequently, the model employs the selected features for downstream processing. Finally, we employ an uncertain fusion module that determines the uncertainty of each atlas and performs an uncertain fusion strategy to get the results at the evidence level. Experimental results on the SRPBS dataset demonstrate that our SUFNN outperforms several state-of-the-art methods in identifying brain disorders.

在识别脑部疾病方面,多图谱脑网络提供了比单一图谱更全面和复杂的理解。传统的多图谱分析方法依赖于一些简单的融合方法(即加法和拼接),但没有考虑脑区增加带来的信息冗余和多个图谱之间信息的不确定性。为了解决这个问题,我们提出了一种有效的基于稀疏不确定机制的多图谱脑网络分析方法,称为稀疏不确定融合神经网络(SUFNN)。本文首先利用不同的图谱构建了基于功能磁共振成像(fMRI)的多图谱脑网络。然后,使用注意力增强模块来学习每个地图集的特征。这些特征被输入到多图谱脑区选择模块中,该模块可以根据其重要性得分选择与疾病相关的脑区。随后,模型利用选择的特征进行下游处理。最后,我们采用不确定融合模块来确定每个图谱的不确定性,并执行不确定融合策略以获得证据级的结果。在SRPBS数据集上的实验结果表明,我们的SUFNN在识别大脑疾病方面优于几种最先进的方法。
{"title":"Select then fusion: An effective multi-atlas brain network analysis method with sparse and uncertain mechanism.","authors":"Jiashuang Huang, Zhan Su, Shu Jiang, Tao Hou, Mingliang Wang, Weiping Ding","doi":"10.1016/j.neunet.2026.108698","DOIUrl":"https://doi.org/10.1016/j.neunet.2026.108698","url":null,"abstract":"<p><p>Multi-atlas brain networks offer a more comprehensive and intricate understanding than a single atlas in identifying brain disorders. Traditional multi-atlas analysis methods depend on some simple fusion methods (i.e., addition and concatenation) but do not consider the information redundancy caused by increased brain regions and uncertain information between multiple atlases. To address this, we propose an effective multi-atlas brain network analysis method with a sparse and uncertain mechanism, called Sparse and Uncertain Fusion Neural Network (SUFNN). We first construct a multi-atlas brain network based on functional magnetic resonance imaging (fMRI) using different atlases. Then, an attention-enhanced module is used to learn the features of each atlas. These features are fed into the multi-atlas brain regions selection module, which can select disease-related brain regions based on their importance scores. Subsequently, the model employs the selected features for downstream processing. Finally, we employ an uncertain fusion module that determines the uncertainty of each atlas and performs an uncertain fusion strategy to get the results at the evidence level. Experimental results on the SRPBS dataset demonstrate that our SUFNN outperforms several state-of-the-art methods in identifying brain disorders.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"108698"},"PeriodicalIF":6.3,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146229340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A noise robust and distribution-adaptive framework for multivariate time series anomaly detection. 多变量时间序列异常检测的噪声鲁棒和分布自适应框架。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-10 DOI: 10.1016/j.neunet.2026.108695
Yanling Du, Ziliang Yang, Baozeng Chang, Jingxia Gao, Xiaojia Bao, Wei Song

Existing unsupervised anomaly detection methods for multivariate time series(MTS) have demonstrated advanced performance on numerous public datasets. However, these methods exhibit two critical limitations: (1) the assumption of completely noise-free training data contradicts real-world conditions where normal samples are inevitably contaminated; (2) the inherent non-stationarity of MTS induces distribution shift, leading to biased learning and degraded generalization capabilities. This paper proposes NORDA, a novel MTS anomaly detection framework that integrates a multi-order difference mechanism with distribution shift optimization. Firstly, a multi-order difference mechanism performs multi-order explicit differencing on raw temporal signals, effectively mitigating noise interference during representation learning. Secondly, a mixed reversible normalization module is proposed, employing a normalization network with multiple statistical features to dynamically model non-stationary variations across variables. This module achieves remove and restore of non-stationary properties of MTS through a symmetric reversible architecture, thereby enhancing the dynamic adaptability of model to distribution shift. By synergistically integrating the two aforementioned modules with a Transformer-based multi-layer encoder, this framework can extract robust latent representations through modeling of inter-channel dependencies in differentially processed data streams. Extensive experiments on seven benchmark datasets demonstrate that NORDA significantly outperforms sixteen typical baseline methods while exhibiting great robustness against noise contamination.

现有的多变量时间序列(MTS)无监督异常检测方法已经在大量公共数据集上展示了先进的性能。然而,这些方法有两个关键的局限性:(1)训练数据完全无噪声的假设与现实世界中正常样本不可避免地受到污染的情况相矛盾;(2) MTS固有的非平稳性导致分布移位,导致学习偏差和泛化能力下降。本文提出了一种将多阶差分机制与分布移位优化相结合的MTS异常检测框架NORDA。首先,采用多阶差分机制对原始时间信号进行多阶显式差分,有效缓解表征学习过程中的噪声干扰;其次,提出了一种混合可逆归一化模块,利用具有多个统计特征的归一化网络对变量间的非平稳变化进行动态建模。该模块通过对称可逆架构实现了MTS非平稳特性的去除和恢复,从而增强了模型对分布移位的动态适应性。通过将上述两个模块与基于transformer的多层编码器协同集成,该框架可以通过对差分处理数据流中的通道间依赖关系建模来提取鲁棒的潜在表示。在7个基准数据集上进行的大量实验表明,NORDA显著优于16种典型的基线方法,同时对噪声污染表现出很强的鲁棒性。
{"title":"A noise robust and distribution-adaptive framework for multivariate time series anomaly detection.","authors":"Yanling Du, Ziliang Yang, Baozeng Chang, Jingxia Gao, Xiaojia Bao, Wei Song","doi":"10.1016/j.neunet.2026.108695","DOIUrl":"https://doi.org/10.1016/j.neunet.2026.108695","url":null,"abstract":"<p><p>Existing unsupervised anomaly detection methods for multivariate time series(MTS) have demonstrated advanced performance on numerous public datasets. However, these methods exhibit two critical limitations: (1) the assumption of completely noise-free training data contradicts real-world conditions where normal samples are inevitably contaminated; (2) the inherent non-stationarity of MTS induces distribution shift, leading to biased learning and degraded generalization capabilities. This paper proposes NORDA, a novel MTS anomaly detection framework that integrates a multi-order difference mechanism with distribution shift optimization. Firstly, a multi-order difference mechanism performs multi-order explicit differencing on raw temporal signals, effectively mitigating noise interference during representation learning. Secondly, a mixed reversible normalization module is proposed, employing a normalization network with multiple statistical features to dynamically model non-stationary variations across variables. This module achieves remove and restore of non-stationary properties of MTS through a symmetric reversible architecture, thereby enhancing the dynamic adaptability of model to distribution shift. By synergistically integrating the two aforementioned modules with a Transformer-based multi-layer encoder, this framework can extract robust latent representations through modeling of inter-channel dependencies in differentially processed data streams. Extensive experiments on seven benchmark datasets demonstrate that NORDA significantly outperforms sixteen typical baseline methods while exhibiting great robustness against noise contamination.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"108695"},"PeriodicalIF":6.3,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146214761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking multivariate modeling in long-term forecasting: an efficient univariate framework with power decomposition and post-Calibration. 长期预测中多变量模型的再思考:一个有效的单变量框架与幂分解和后校正。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-10 DOI: 10.1016/j.neunet.2026.108710
Hongchi Chen, Jifei Tang, Lanhua Xia, Yuanjin Bao

Long-term time series forecasting (LTSF) is critical to industrial applications. While recent advances mainly focus on modeling complex multivariate interactions, the practical benefits may only be marginal by challenges such as asynchronous data drifts, noise interference, and abrupt changes. This study demonstrates a well-designed univariate modeling can be more effective and efficient. The univariate model with Power Decomposition and online Post-Calibration (PDCNet) is proposed, which incorporates two novel mechanisms. 1) Power decomposition (P3D) is designed to disentangle time series based on data power distribution, which significantly enhancing data predictability and mitigating the obscuring effect from dominant periodicities. Predictability-ACF joint analysis is introduced to determine optimal decomposition thresholds. 2) By designing the abrupt factor M to classify the data morphological changes and historical performance-based correction dictionary, a lightweight online Post-Calibration is proposed to adapt to pattern drifts without retraining the main model. Comprehensive experiments show that PDCNet consistently outperforms state-of-the-art models on univariate tasks. Through simple aggregation, it also achieves top-tier multivariate performance. P3D brings an average 15% improvement in 68.57% of cases, while calibration further improves accuracy by 16% in calibrated regions. Notably, PDCNet reduces GPU memory usage which can reach up to 95% compared to multivariate counterparts in extreme-scale tasks. Our work proves that capturing internal temporal dependencies within each variable is a more efficient and practical design for LTSF. PDCNet can serve as a competitive baseline, offering superior performance with significantly reduced memory footprint. The related source code and configuration files can be accessed at https://github.com/hongchichen/PDCNet.

长期时间序列预测(LTSF)对工业应用至关重要。虽然最近的进展主要集中在复杂的多元交互建模上,但实际的好处可能只是在异步数据漂移、噪声干扰和突然变化等挑战面前显得微不足道。该研究表明,设计良好的单变量模型可以更有效和高效。提出了基于幂分解和在线后校正(PDCNet)的单变量模型,该模型融合了两种新的机制。1)功率分解(Power decomposition, P3D)是基于数据功率分布对时间序列进行解纠缠,显著提高了数据的可预测性,减轻了优势周期的遮挡效应。引入可预测性- acf联合分析来确定最优分解阈值。2)通过设计突变因子M对数据形态变化进行分类和基于历史性能的校正字典,在不重新训练主模型的情况下,提出了一种适应模式漂移的轻量级在线后校正方法。综合实验表明,PDCNet在单变量任务上始终优于最先进的模型。通过简单的聚合,也实现了顶级的多元性能。在68.57%的情况下,P3D平均提高了15%,而校准进一步提高了被校准区域16%的精度。值得注意的是,PDCNet减少了GPU内存的使用,在极端规模的任务中,与多变量对应的GPU内存使用相比,可以达到95%。我们的工作证明,捕获每个变量中的内部时间依赖性是LTSF更有效和实用的设计。PDCNet可以作为具有竞争力的基准,在显著减少内存占用的情况下提供卓越的性能。相关的源代码和配置文件可以在https://github.com/hongchichen/PDCNet上访问。
{"title":"Rethinking multivariate modeling in long-term forecasting: an efficient univariate framework with power decomposition and post-Calibration.","authors":"Hongchi Chen, Jifei Tang, Lanhua Xia, Yuanjin Bao","doi":"10.1016/j.neunet.2026.108710","DOIUrl":"https://doi.org/10.1016/j.neunet.2026.108710","url":null,"abstract":"<p><p>Long-term time series forecasting (LTSF) is critical to industrial applications. While recent advances mainly focus on modeling complex multivariate interactions, the practical benefits may only be marginal by challenges such as asynchronous data drifts, noise interference, and abrupt changes. This study demonstrates a well-designed univariate modeling can be more effective and efficient. The univariate model with Power Decomposition and online Post-Calibration (PDCNet) is proposed, which incorporates two novel mechanisms. 1) Power decomposition (P3D) is designed to disentangle time series based on data power distribution, which significantly enhancing data predictability and mitigating the obscuring effect from dominant periodicities. Predictability-ACF joint analysis is introduced to determine optimal decomposition thresholds. 2) By designing the abrupt factor M to classify the data morphological changes and historical performance-based correction dictionary, a lightweight online Post-Calibration is proposed to adapt to pattern drifts without retraining the main model. Comprehensive experiments show that PDCNet consistently outperforms state-of-the-art models on univariate tasks. Through simple aggregation, it also achieves top-tier multivariate performance. P3D brings an average 15% improvement in 68.57% of cases, while calibration further improves accuracy by 16% in calibrated regions. Notably, PDCNet reduces GPU memory usage which can reach up to 95% compared to multivariate counterparts in extreme-scale tasks. Our work proves that capturing internal temporal dependencies within each variable is a more efficient and practical design for LTSF. PDCNet can serve as a competitive baseline, offering superior performance with significantly reduced memory footprint. The related source code and configuration files can be accessed at https://github.com/hongchichen/PDCNet.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"108710"},"PeriodicalIF":6.3,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146221731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-view graph clustering via dual attention fusion and collaborative optimization. 基于双注意融合和协同优化的多视图图聚类。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-10 DOI: 10.1016/j.neunet.2026.108704
Zuowei Wang, Sen Xu, Naixuan Guo, Xuesheng Bian, Xiufang Xu, Shanliang Yao, Xianye Ben, Tian Zhou

Multi-view graph clustering, a fundamental task in data mining and machine learning, aims to partition nodes into disjoint groups by leveraging complementary information from multiple data sources. Although significant progress has been made, existing methods often struggle to effectively capture both the unique structural information within each view and the complementary relationships across different views. Moreover, the lack of mechanisms to enforce global semantic consistency frequently results in unstable consensus representations and degraded clustering quality. To address these issues, we propose a novel end-to-end method, Multi-view Graph Clustering via Dual attention fusion and Collaborative optimization (MGCDC). Specifically, each view is first encoded using a graph attention autoencoder to obtain view-specific node embeddings. These embeddings are then integrated via a view-level attention mechanism to generate a unified consensus representation. To guide the learning process, we introduce two collaborative optimization objectives. First, a cross-view cluster alignment loss is employed to jointly perform self-training learning on both the view-specific and consensus embeddings. Second, a semantic consistency enhancement loss is introduced to maximize mutual information between node embeddings and their corresponding cluster summaries. The entire model is optimized end-to-end by jointly learning node representations, integrating multi-view information, and refining cluster assignments. Extensive experiments on five benchmark datasets demonstrate that MGCDC achieves highly competitive performance compared to state-of-the-art methods.

多视图图聚类是数据挖掘和机器学习中的一项基本任务,旨在利用来自多个数据源的互补信息将节点划分为不相交的组。尽管已经取得了重大进展,但现有的方法往往难以有效地捕获每个视图中的独特结构信息和不同视图之间的互补关系。此外,缺乏强制全局语义一致性的机制经常导致不稳定的共识表示和降低聚类质量。为了解决这些问题,我们提出了一种新的端到端方法,即通过双注意融合和协同优化的多视图图聚类(MGCDC)。具体来说,每个视图首先使用图注意自动编码器进行编码,以获得特定于视图的节点嵌入。然后,这些嵌入通过视图级注意机制集成,以生成统一的共识表示。为了指导学习过程,我们引入了两个协同优化目标。首先,采用跨视图聚类对齐损失对视图特定嵌入和共识嵌入联合进行自训练学习。其次,引入语义一致性增强损失来最大化节点嵌入与其相应的聚类摘要之间的互信息。通过联合学习节点表示、整合多视图信息、细化聚类分配,对整个模型进行端到端的优化。在五个基准数据集上进行的大量实验表明,与最先进的方法相比,MGCDC实现了极具竞争力的性能。
{"title":"Multi-view graph clustering via dual attention fusion and collaborative optimization.","authors":"Zuowei Wang, Sen Xu, Naixuan Guo, Xuesheng Bian, Xiufang Xu, Shanliang Yao, Xianye Ben, Tian Zhou","doi":"10.1016/j.neunet.2026.108704","DOIUrl":"https://doi.org/10.1016/j.neunet.2026.108704","url":null,"abstract":"<p><p>Multi-view graph clustering, a fundamental task in data mining and machine learning, aims to partition nodes into disjoint groups by leveraging complementary information from multiple data sources. Although significant progress has been made, existing methods often struggle to effectively capture both the unique structural information within each view and the complementary relationships across different views. Moreover, the lack of mechanisms to enforce global semantic consistency frequently results in unstable consensus representations and degraded clustering quality. To address these issues, we propose a novel end-to-end method, Multi-view Graph Clustering via Dual attention fusion and Collaborative optimization (MGCDC). Specifically, each view is first encoded using a graph attention autoencoder to obtain view-specific node embeddings. These embeddings are then integrated via a view-level attention mechanism to generate a unified consensus representation. To guide the learning process, we introduce two collaborative optimization objectives. First, a cross-view cluster alignment loss is employed to jointly perform self-training learning on both the view-specific and consensus embeddings. Second, a semantic consistency enhancement loss is introduced to maximize mutual information between node embeddings and their corresponding cluster summaries. The entire model is optimized end-to-end by jointly learning node representations, integrating multi-view information, and refining cluster assignments. Extensive experiments on five benchmark datasets demonstrate that MGCDC achieves highly competitive performance compared to state-of-the-art methods.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"108704"},"PeriodicalIF":6.3,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146203511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Neural Networks
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1