首页 > 最新文献

IEEE transactions on pattern analysis and machine intelligence最新文献

英文 中文
CrossEarth: Geospatial Vision Foundation Model for Domain Generalizable Remote Sensing Semantic Segmentation. 跨域遥感语义分割的地理空间视觉基础模型。
IF 18.6 Pub Date : 2025-12-29 DOI: 10.1109/TPAMI.2025.3649001
Ziyang Gong, Zhixiang Wei, Di Wang, Xiaoxing Hu, Xianzheng Ma, Hongruixuan Chen, Yuru Jia, Yupeng Deng, Zhenming Ji, Xiangwei Zhu, Xue Yang, Naoto Yokoya, Jing Zhang, Bo Du, Junchi Yan, Liangpei Zhang

Due to the substantial domain gaps in Remote Sensing (RS) images that are characterized by variabilities such as location, wavelength, and sensor type, Remote Sensing Domain Generalization (RSDG) has emerged as a critical and valuable research frontier, focusing on developing models that generalize effectively across diverse scenarios. However, research in this area remains underexplored: (1) Current cross-domain methods primarily focus on Domain Adaptation (DA), which adapts models to predefined domains rather than to unseen ones; (2) Few studies target the RSDG issue, especially for semantic segmentation tasks. Existing related models are developed for specific unknown domains, struggling with issues of underfitting on other unseen scenarios; (3) Existing RS foundation models tend to prioritize in-domain performance over cross-domain generalization. To this end, we introduce the first vision foundation model for RSDG semantic segmentation, CrossEarth. CrossEarth demonstrates strong cross-domain generalization through a specially designed data-level Earth-Style Injection pipeline and a model-level Multi-Task Training pipeline. In addition, for the semantic segmentation task, we have curated an RSDG benchmark comprising 32 semantic segmentation scenarios across various regions, spectral bands, platforms, and climates, providing comprehensive evaluations of the generalizability of future RSDG models. Extensive experiments on this collection demonstrate the superiority of CrossEarth over existing state-of-the-art methods.

由于遥感(RS)图像中存在大量由位置、波长和传感器类型等变量表征的域缺口,遥感域概化(RSDG)已成为一个关键和有价值的研究前沿,重点是开发在不同场景下有效概化的模型。(1)当前的跨域方法主要集中在域自适应(DA)上,它使模型适应预定义的域,而不是不可见的域;(2)针对RSDG问题,特别是语义分割任务的研究很少。现有的相关模型是针对特定的未知领域开发的,在其他未知场景下存在拟合不足的问题;(3)现有RS基础模型倾向于优先考虑域内性能而不是跨域泛化。为此,我们引入了第一个用于RSDG语义分割的视觉基础模型——CrossEarth。通过特别设计的数据级地球式注入管道和模型级多任务训练管道,CrossEarth展示了强大的跨域泛化。此外,对于语义分割任务,我们还设计了一个RSDG基准测试,该测试包含32个不同区域、光谱带、平台和气候的语义分割场景,全面评估了未来RSDG模型的泛化性。在这个集合上进行的大量实验证明了CrossEarth优于现有的最先进的方法。
{"title":"CrossEarth: Geospatial Vision Foundation Model for Domain Generalizable Remote Sensing Semantic Segmentation.","authors":"Ziyang Gong, Zhixiang Wei, Di Wang, Xiaoxing Hu, Xianzheng Ma, Hongruixuan Chen, Yuru Jia, Yupeng Deng, Zhenming Ji, Xiangwei Zhu, Xue Yang, Naoto Yokoya, Jing Zhang, Bo Du, Junchi Yan, Liangpei Zhang","doi":"10.1109/TPAMI.2025.3649001","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3649001","url":null,"abstract":"<p><p>Due to the substantial domain gaps in Remote Sensing (RS) images that are characterized by variabilities such as location, wavelength, and sensor type, Remote Sensing Domain Generalization (RSDG) has emerged as a critical and valuable research frontier, focusing on developing models that generalize effectively across diverse scenarios. However, research in this area remains underexplored: (1) Current cross-domain methods primarily focus on Domain Adaptation (DA), which adapts models to predefined domains rather than to unseen ones; (2) Few studies target the RSDG issue, especially for semantic segmentation tasks. Existing related models are developed for specific unknown domains, struggling with issues of underfitting on other unseen scenarios; (3) Existing RS foundation models tend to prioritize in-domain performance over cross-domain generalization. To this end, we introduce the first vision foundation model for RSDG semantic segmentation, CrossEarth. CrossEarth demonstrates strong cross-domain generalization through a specially designed data-level Earth-Style Injection pipeline and a model-level Multi-Task Training pipeline. In addition, for the semantic segmentation task, we have curated an RSDG benchmark comprising 32 semantic segmentation scenarios across various regions, spectral bands, platforms, and climates, providing comprehensive evaluations of the generalizability of future RSDG models. Extensive experiments on this collection demonstrate the superiority of CrossEarth over existing state-of-the-art methods.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145859653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bi-C2R: Bidirectional Continual Compatible Representation for Re-Indexing Free Lifelong Person Re-Identification. 双向c2r:双向连续兼容的再标引方式。
IF 18.6 Pub Date : 2025-12-29 DOI: 10.1109/TPAMI.2025.3649078
Zhenyu Cui, Jiahuan Zhou, Yuxin Peng

Lifelong person Re-IDentification (L-ReID) exploits sequentially collected data to continuously train and update a ReID model, focusing on the overall performance of all data. Its main challenge is to avoid the catastrophic forgetting problem of old knowledge while training on new data. Existing L-ReID methods typically re-extract new features for all historical gallery images for inference after each update, known as "re-indexing". However, historical gallery data typically suffers from direct saving due to the data privacy issue and the high re-indexing costs for large-scale gallery images. As a result, it inevitably leads to incompatible retrieval between query features extracted by the updated model and gallery features extracted by those before the update, greatly impairing the re-identification performance. To tackle the above issue, this paper focuses on a new task called Re-index Free Lifelong person Re-IDentification (RFL-ReID), which requires performing lifelong person re-identification without re-indexing historical gallery images. Therefore, RFL-ReID is more challenging than L-ReID, requiring continuous learning and balancing new and old knowledge in diverse streaming data, and making the features output by the new and old models compatible with each other. To this end, we propose a Bidirectional Continuous Compatible Representation (Bi-C2R) framework to continuously update the gallery features extracted by the old model to perform efficient L-ReID in a compatible manner. Specifically, a bidirectional compatible transfer network is first designed to bridge the relationship between new and old knowledge and continuously update the old gallery features to the new feature space after the updating. Secondly, a bidirectional compatible distillation module and a bidirectional anti-forgetting distillation model are designed to balance the compatibility between the new and old knowledge in dual feature spaces. Finally, a feature-level exponential moving average strategy is designed to adaptively fill the diverse knowledge gaps between different data domains. Finally, we verify our proposed Bi-C2R method through theoretical analysis and extensive experiments on multiple benchmarks, which demonstrate that the proposed method can achieve leading performance on both the introduced RFL-ReID task and the traditional L-ReID task.

终身人再识别(L-ReID)利用顺序收集的数据不断训练和更新ReID模型,关注所有数据的整体性能。它的主要挑战是在训练新数据时避免旧知识的灾难性遗忘问题。现有的L-ReID方法通常在每次更新后重新提取所有历史图库图像的新特征以进行推理,称为“重新索引”。然而,由于数据隐私问题和大规模图库图像的高重新索引成本,历史图库数据通常无法直接保存。这就不可避免地导致更新模型后提取的查询特征与更新前提取的库特征检索不兼容,极大地影响了重新识别的性能。为了解决上述问题,本文重点研究了一个新的任务,称为重新索引免费终身人物重新识别(RFL-ReID),该任务要求在不重新索引历史图库图像的情况下进行终身人物重新识别。因此,RFL-ReID比L-ReID更具挑战性,需要在多样化的流数据中不断学习和平衡新旧知识,并使新旧模型输出的特征相互兼容。为此,我们提出了一个双向连续兼容表示(Bi-C2R)框架,以不断更新旧模型提取的图库特征,以兼容的方式执行高效的L-ReID。具体而言,首先设计一个双向兼容的传递网络,架起新旧知识之间的桥梁,并在更新后不断将旧的图库特征更新到新的特征空间。其次,设计双向兼容蒸馏模块和双向防遗忘蒸馏模型,平衡新旧知识在对偶特征空间中的兼容性;最后,设计了一种特征级指数移动平均策略,自适应地填补了不同数据域之间的知识缺口。最后,我们通过理论分析和多个基准的大量实验验证了所提出的Bi-C2R方法,结果表明所提出的方法在引入的RFL-ReID任务和传统的L-ReID任务上都能取得领先的性能。
{"title":"Bi-C<sup>2</sup>R: Bidirectional Continual Compatible Representation for Re-Indexing Free Lifelong Person Re-Identification.","authors":"Zhenyu Cui, Jiahuan Zhou, Yuxin Peng","doi":"10.1109/TPAMI.2025.3649078","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3649078","url":null,"abstract":"<p><p>Lifelong person Re-IDentification (L-ReID) exploits sequentially collected data to continuously train and update a ReID model, focusing on the overall performance of all data. Its main challenge is to avoid the catastrophic forgetting problem of old knowledge while training on new data. Existing L-ReID methods typically re-extract new features for all historical gallery images for inference after each update, known as \"re-indexing\". However, historical gallery data typically suffers from direct saving due to the data privacy issue and the high re-indexing costs for large-scale gallery images. As a result, it inevitably leads to incompatible retrieval between query features extracted by the updated model and gallery features extracted by those before the update, greatly impairing the re-identification performance. To tackle the above issue, this paper focuses on a new task called Re-index Free Lifelong person Re-IDentification (RFL-ReID), which requires performing lifelong person re-identification without re-indexing historical gallery images. Therefore, RFL-ReID is more challenging than L-ReID, requiring continuous learning and balancing new and old knowledge in diverse streaming data, and making the features output by the new and old models compatible with each other. To this end, we propose a Bidirectional Continuous Compatible Representation (Bi-C<sup>2</sup>R) framework to continuously update the gallery features extracted by the old model to perform efficient L-ReID in a compatible manner. Specifically, a bidirectional compatible transfer network is first designed to bridge the relationship between new and old knowledge and continuously update the old gallery features to the new feature space after the updating. Secondly, a bidirectional compatible distillation module and a bidirectional anti-forgetting distillation model are designed to balance the compatibility between the new and old knowledge in dual feature spaces. Finally, a feature-level exponential moving average strategy is designed to adaptively fill the diverse knowledge gaps between different data domains. Finally, we verify our proposed Bi-C<sup>2</sup>R method through theoretical analysis and extensive experiments on multiple benchmarks, which demonstrate that the proposed method can achieve leading performance on both the introduced RFL-ReID task and the traditional L-ReID task.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145859668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Continuous Review and Timely Correction: Enhancing the Resistance to Noisy Labels Via Self-Not-True and Class-Wise Distillation. 持续回顾及时修正:通过自我不真实和类别明智的蒸馏增强对噪声标签的抵抗力。
IF 18.6 Pub Date : 2025-12-29 DOI: 10.1109/TPAMI.2025.3649111
Long Lan, Jingyi Wang, Xinghao Wu, Bo Han, Xinwang Liu

Deep neural networks possess remarkable learning capabilities and expressive power, but this makes them vulnerable to overfitting, especially when they encounter mislabeled data. A notable phenomenon called the memorization effect occurs when networks first learn the correctly labeled data and later memorize the mislabeled instances. While early stopping can mitigate overfitting, it doesn't entirely prevent networks from adapting to incorrect labels during the initial training phases, which can result in losing valuable insights from accurate data. Moreover, early stopping cannot rectify the mistakes caused by mislabeled inputs, underscoring the need for improved strategies. In this paper, we introduce an innovative mechanism for continuous review and timely correction of learned knowledge. Our approach allows the network to repeatedly revisit and reinforce correct information while promptly addressing any inaccuracies stemming from mislabeled data. We present a novel method called self-not-true-distillation (SNTD). This technique employs self-distillation, where the network from previous training iterations acts as a teacher, guiding the current network to review and solidify its understanding of accurate labels. Crucially, SNTD masks the true class label in the logits during this process, concentrating on the non-true classes to correct any erroneous knowledge that may have been acquired. We also recognize that different data classes follow distinct learning trajectories. A single teacher network might struggle to effectively guide the learning of all classes at once, which necessitates selecting different teacher networks for each specific class. Additionally, the influence of the teacher network's guidance varies throughout the training process. To address these challenges, we propose SNTD+, which integrates a class-wise distillation strategy along with a dynamic weight adjustment mechanism. Together, these enhancements significantly bolster SNTD's robustness in tackling complex scenarios characterized by label noise.

深度神经网络具有卓越的学习能力和表达能力,但这使得它们容易受到过拟合的影响,特别是当它们遇到错误标记的数据时。当网络首先学习正确标记的数据,然后记忆错误标记的实例时,会出现一个值得注意的现象,称为记忆效应。虽然早期停止可以减轻过拟合,但它并不能完全防止网络在初始训练阶段适应错误的标签,这可能导致失去准确数据中有价值的见解。此外,早期停止并不能纠正错误标示的投入所造成的错误,强调需要改进战略。在本文中,我们介绍了一种创新的机制,用于不断审查和及时纠正所学知识。我们的方法允许网络反复访问和强化正确的信息,同时迅速解决因错误标记数据而产生的任何不准确信息。提出了一种新的自非真蒸馏(SNTD)方法。该技术采用自蒸馏,其中来自先前训练迭代的网络充当教师,指导当前网络检查并巩固其对准确标签的理解。至关重要的是,SNTD在此过程中掩盖了逻辑中的真实类标签,专注于非真实类,以纠正可能获得的任何错误知识。我们还认识到,不同的数据类遵循不同的学习轨迹。单一的教师网络可能难以同时有效地指导所有班级的学习,这就需要为每个特定的班级选择不同的教师网络。此外,教师网络指导的影响在整个培训过程中是不同的。为了解决这些挑战,我们提出了SNTD+,它集成了类智能蒸馏策略和动态权重调整机制。总之,这些增强显著增强了SNTD在处理以标签噪声为特征的复杂场景中的鲁棒性。
{"title":"Continuous Review and Timely Correction: Enhancing the Resistance to Noisy Labels Via Self-Not-True and Class-Wise Distillation.","authors":"Long Lan, Jingyi Wang, Xinghao Wu, Bo Han, Xinwang Liu","doi":"10.1109/TPAMI.2025.3649111","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3649111","url":null,"abstract":"<p><p>Deep neural networks possess remarkable learning capabilities and expressive power, but this makes them vulnerable to overfitting, especially when they encounter mislabeled data. A notable phenomenon called the memorization effect occurs when networks first learn the correctly labeled data and later memorize the mislabeled instances. While early stopping can mitigate overfitting, it doesn't entirely prevent networks from adapting to incorrect labels during the initial training phases, which can result in losing valuable insights from accurate data. Moreover, early stopping cannot rectify the mistakes caused by mislabeled inputs, underscoring the need for improved strategies. In this paper, we introduce an innovative mechanism for continuous review and timely correction of learned knowledge. Our approach allows the network to repeatedly revisit and reinforce correct information while promptly addressing any inaccuracies stemming from mislabeled data. We present a novel method called self-not-true-distillation (SNTD). This technique employs self-distillation, where the network from previous training iterations acts as a teacher, guiding the current network to review and solidify its understanding of accurate labels. Crucially, SNTD masks the true class label in the logits during this process, concentrating on the non-true classes to correct any erroneous knowledge that may have been acquired. We also recognize that different data classes follow distinct learning trajectories. A single teacher network might struggle to effectively guide the learning of all classes at once, which necessitates selecting different teacher networks for each specific class. Additionally, the influence of the teacher network's guidance varies throughout the training process. To address these challenges, we propose SNTD+, which integrates a class-wise distillation strategy along with a dynamic weight adjustment mechanism. Together, these enhancements significantly bolster SNTD's robustness in tackling complex scenarios characterized by label noise.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145859713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spike Camera Optical Flow Estimation Based on Continuous Spike Streams. 基于连续尖峰流的尖峰相机光流估计。
IF 18.6 Pub Date : 2025-12-29 DOI: 10.1109/TPAMI.2025.3649050
Rui Zhao, Ruiqin Xiong, Dongkai Wang, Shiyu Xuan, Jian Zhang, Xiaopeng Fan, Tiejun Huang

Spike camera is an emerging bio-inspired vision sensor with ultra-high temporal resolution. It records scenes by accumulating photons and outputting binary spike streams. Optical flow estimation aims to estimate pixel-level correspondences between different moments, describing motion information along time, which is a key task of spike camera. High-quality optical flow is important since motion information is a foundation for analyzing spikes. However, extracting stable light-intensity information from spikes is difficult due to the randomness of binary spikes. Besides, the continuity of spikes can offer contextual information for optical flow. In this paper, we propose a network Spike2Flow++ to estimate optical flow for spike camera. In Spike2Flow++, we propose a differential of spike firing time (DSFT) to represent information in binary spikes. Moreover, we propose a dual DSFT representation and a dual correlation construction to extract stable light-intensity information for reliable correlations. To use the continuity of spikes as motion contextual information, we propose a joint correlation decoding (JCD) that jointly estimates a series of flow fields. To adaptively fuse different motions in JCD, we propose a global motion bank aggregation to construct an information bank for all motions and adaptively extract contexts from the bank for each iteration during recurrent decoding of each motion. To train and evaluate our network, we construct a real scene with spikes and flow++ (RSSF++) based on real-world scenes. Experiments demonstrate that our Spike2Flow++ achieves state-of-the-art performance on RSSF++, photo-realistic high-speed motion (PHM), and real-captured data.

Spike相机是一种新兴的生物视觉传感器,具有超高的时间分辨率。它通过积累光子和输出二进制尖峰流来记录场景。光流估计的目的是估计不同时刻之间像素级的对应关系,描述运动信息随时间的变化,这是尖峰相机的关键任务。高质量的光流很重要,因为运动信息是分析尖峰的基础。然而,由于二进制尖峰的随机性,从尖峰中提取稳定的光强信息是困难的。此外,尖峰的连续性可以为光流提供上下文信息。在本文中,我们提出了一个spike2flow++网络来估计尖峰相机的光流。在spike2flow++中,我们提出了一个尖峰触发时间的微分(DSFT)来表示二进制尖峰中的信息。此外,我们提出了一种双DSFT表示和双相关结构来提取稳定的光强信息,以获得可靠的相关性。为了利用峰值的连续性作为运动上下文信息,我们提出了一种联合相关解码(JCD)方法,该方法可以联合估计一系列流场。为了在JCD中自适应融合不同的运动,我们提出了一种全局运动库聚合方法来构建所有运动的信息库,并在每个运动的循环解码过程中自适应地从信息库中提取每次迭代的上下文。为了训练和评估我们的网络,我们基于真实场景构建了一个具有峰值和流量的真实场景(rssf++)。实验表明,我们的spike2flow++在rssf++、逼真的高速运动(PHM)和实时捕获数据上实现了最先进的性能。
{"title":"Spike Camera Optical Flow Estimation Based on Continuous Spike Streams.","authors":"Rui Zhao, Ruiqin Xiong, Dongkai Wang, Shiyu Xuan, Jian Zhang, Xiaopeng Fan, Tiejun Huang","doi":"10.1109/TPAMI.2025.3649050","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3649050","url":null,"abstract":"<p><p>Spike camera is an emerging bio-inspired vision sensor with ultra-high temporal resolution. It records scenes by accumulating photons and outputting binary spike streams. Optical flow estimation aims to estimate pixel-level correspondences between different moments, describing motion information along time, which is a key task of spike camera. High-quality optical flow is important since motion information is a foundation for analyzing spikes. However, extracting stable light-intensity information from spikes is difficult due to the randomness of binary spikes. Besides, the continuity of spikes can offer contextual information for optical flow. In this paper, we propose a network Spike2Flow++ to estimate optical flow for spike camera. In Spike2Flow++, we propose a differential of spike firing time (DSFT) to represent information in binary spikes. Moreover, we propose a dual DSFT representation and a dual correlation construction to extract stable light-intensity information for reliable correlations. To use the continuity of spikes as motion contextual information, we propose a joint correlation decoding (JCD) that jointly estimates a series of flow fields. To adaptively fuse different motions in JCD, we propose a global motion bank aggregation to construct an information bank for all motions and adaptively extract contexts from the bank for each iteration during recurrent decoding of each motion. To train and evaluate our network, we construct a real scene with spikes and flow++ (RSSF++) based on real-world scenes. Experiments demonstrate that our Spike2Flow++ achieves state-of-the-art performance on RSSF++, photo-realistic high-speed motion (PHM), and real-captured data.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145859059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward Efficient Semi-Supervised Object Detection With Detection Transformer 基于检测变压器的高效半监督目标检测。
IF 18.6 Pub Date : 2025-12-10 DOI: 10.1109/TPAMI.2025.3642123
Jiacheng Zhang;Jiaming Li;Xiangru Lin;Wei Zhang;Xiao Tan;Hongbo Gao;Jingdong Wang;Guanbin Li
Semi-supervised object detection (SSOD) mitigates the annotation burden in object detection by leveraging unlabeled data, providing a scalable solution for modern perception systems. Concurrently, detection transformers (DETRs) have emerged as a popular end-to-end framework, offering advantages such as non-maximum suppression (NMS)-free inference. However, existing SSOD methods are predominantly designed for conventional detectors, leaving the exploration of DETR-based SSOD largely uncharted. This paper presents a systematic study to bridge this gap. We begin by identifying two principal obstacles in semi-supervised DETR training: (1) the inherent one-to-one assignment mechanism of DETRs is highly sensitive to noisy pseudo-labels, which impedes training efficiency; and (2) the query-based decoder architecture complicates the design of an effective consistency regularization scheme, limiting further performance gains. To address these challenges, we propose Semi-DETR++, a novel framework for efficient SSOD with DETRs. Our approach introduces a stage-wise hybrid matching strategy that enhances robustness to noisy pseudo-labels by synergistically combining one-to-many and one-to-one assignments while preserving NMS-free inference. Furthermore, based on our observation of the unique layer-wise decoding behavior in DETRs, we develop a simple yet effective re-decode query consistency training method to regularize the decoder. Extensive experiments demonstrate that Semi-DETR++ enables more efficient semi-supervised learning across various DETR architectures, outperforming existing methods by significant margins. The proposed components are also flexible and versatile, showing superior generalization by readily extending to semi-supervised segmentation tasks.
半监督对象检测(SSOD)通过利用未标记数据减轻了对象检测中的标注负担,为现代感知系统提供了可扩展的解决方案。同时,检测变压器(DETRs)已成为一种流行的端到端框架,具有非最大抑制(NMS)自由推理等优点。然而,现有的SSOD方法主要是为传统探测器设计的,这使得基于der的SSOD的探索在很大程度上是未知的。本文提出了一个系统的研究,以弥补这一差距。本文首先识别了半监督DETR训练中的两个主要障碍:(1)DETR固有的一对一分配机制对噪声伪标签高度敏感,影响了训练效率;(2)基于查询的解码器架构使有效一致性正则化方案的设计复杂化,限制了进一步的性能提升。为了应对这些挑战,我们提出了Semi-DETR++,这是一个具有detr的高效SSOD的新框架。我们的方法引入了一种阶段混合匹配策略,通过协同组合一对多和一对一分配来增强对噪声伪标签的鲁棒性,同时保留无nms的推理。此外,基于我们对DETRs中独特的分层解码行为的观察,我们开发了一种简单而有效的重新解码查询一致性训练方法来正则化解码器。大量实验表明,Semi-DETR++能够在各种DETR架构中实现更有效的半监督学习,显著优于现有方法。所提出的组件也是灵活和通用的,很容易扩展到半监督分割任务,显示出优越的泛化。代码可从https://github.com/JCZ404/Semi-DETR获得。
{"title":"Toward Efficient Semi-Supervised Object Detection With Detection Transformer","authors":"Jiacheng Zhang;Jiaming Li;Xiangru Lin;Wei Zhang;Xiao Tan;Hongbo Gao;Jingdong Wang;Guanbin Li","doi":"10.1109/TPAMI.2025.3642123","DOIUrl":"10.1109/TPAMI.2025.3642123","url":null,"abstract":"Semi-supervised object detection (SSOD) mitigates the annotation burden in object detection by leveraging unlabeled data, providing a scalable solution for modern perception systems. Concurrently, detection transformers (DETRs) have emerged as a popular end-to-end framework, offering advantages such as non-maximum suppression (NMS)-free inference. However, existing SSOD methods are predominantly designed for conventional detectors, leaving the exploration of DETR-based SSOD largely uncharted. This paper presents a systematic study to bridge this gap. We begin by identifying two principal obstacles in semi-supervised DETR training: (1) the inherent one-to-one assignment mechanism of DETRs is highly sensitive to noisy pseudo-labels, which impedes training efficiency; and (2) the query-based decoder architecture complicates the design of an effective consistency regularization scheme, limiting further performance gains. To address these challenges, we propose Semi-DETR++, a novel framework for efficient SSOD with DETRs. Our approach introduces a stage-wise hybrid matching strategy that enhances robustness to noisy pseudo-labels by synergistically combining one-to-many and one-to-one assignments while preserving NMS-free inference. Furthermore, based on our observation of the unique layer-wise decoding behavior in DETRs, we develop a simple yet effective re-decode query consistency training method to regularize the decoder. Extensive experiments demonstrate that Semi-DETR++ enables more efficient semi-supervised learning across various DETR architectures, outperforming existing methods by significant margins. The proposed components are also flexible and versatile, showing superior generalization by readily extending to semi-supervised segmentation tasks.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"48 3","pages":"3765-3782"},"PeriodicalIF":18.6,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145717354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decentralized Federated Learning With Distributed Aggregation Weight Optimization 基于分布式聚合权优化的分散联邦学习
IF 18.6 Pub Date : 2025-12-05 DOI: 10.1109/TPAMI.2025.3640709
Zhiyuan Zhai;Xiaojun Yuan;Xin Wang;Geoffrey Ye Li
Decentralized federated learning (DFL) is an emerging paradigm to enable edge devices collaboratively training a learning model using a device-to-device (D2D) communication manner without the coordination of a parameter server (PS). Aggregation weights, also known as mixing weights, are crucial in DFL process, and impact the learning efficiency and accuracy. Conventional design relies on a so-called central entity to collect all local information and conduct system optimization to obtain appropriate weights. In this paper, we develop a distributed aggregation weight optimization algorithm to align with the decentralized nature of DFL. We analyze convergence by quantitatively capturing the impact of the aggregation weights over decentralized communication networks. Based on the analysis, we then formulate a learning performance optimization problem by designing the aggregation weights to minimize the derived convergence bound. The optimization problem is further transformed as an eigenvalue optimization problem and solved by our proposed subgradient-based algorithm in a distributed fashion. In our algorithm, edge devices only need local information to obtain the optimal aggregation weights through local (D2D) communications, just like the learning itself. Therefore, the optimization, communication, and learning process can be all conducted in a distributed fashion, which leads to a genuinely distributed DFL system. Numerical results demonstrate the superiority of the proposed algorithm in practical DFL deployment.
分散式联邦学习(DFL)是一种新兴的范例,它使边缘设备能够使用设备到设备(D2D)通信方式协同训练学习模型,而无需参数服务器(PS)的协调。聚合权值,又称混合权值,在DFL过程中起着至关重要的作用,影响着学习的效率和准确性。传统的设计依赖于一个所谓的中心实体来收集所有的局部信息,并进行系统优化以获得适当的权重。在本文中,我们开发了一种分布式聚合权优化算法,以适应DFL的分散性。我们通过定量捕获分散通信网络中聚合权值的影响来分析收敛性。在此基础上,我们通过设计聚合权值来最小化推导出的收敛界,从而形成一个学习性能优化问题。将优化问题进一步转化为特征值优化问题,并采用基于子梯度的分布式算法求解。在我们的算法中,边缘设备只需要局部信息就可以通过局部(D2D)通信获得最优的聚合权值,就像学习本身一样。因此,优化、沟通和学习过程都可以以分布式的方式进行,从而形成真正的分布式DFL系统。数值结果表明了该算法在实际DFL部署中的优越性。
{"title":"Decentralized Federated Learning With Distributed Aggregation Weight Optimization","authors":"Zhiyuan Zhai;Xiaojun Yuan;Xin Wang;Geoffrey Ye Li","doi":"10.1109/TPAMI.2025.3640709","DOIUrl":"10.1109/TPAMI.2025.3640709","url":null,"abstract":"Decentralized federated learning (DFL) is an emerging paradigm to enable edge devices collaboratively training a learning model using a device-to-device (D2D) communication manner without the coordination of a parameter server (PS). Aggregation weights, also known as mixing weights, are crucial in DFL process, and impact the learning efficiency and accuracy. Conventional design relies on a so-called central entity to collect all local information and conduct system optimization to obtain appropriate weights. In this paper, we develop a distributed aggregation weight optimization algorithm to align with the decentralized nature of DFL. We analyze convergence by quantitatively capturing the impact of the aggregation weights over decentralized communication networks. Based on the analysis, we then formulate a learning performance optimization problem by designing the aggregation weights to minimize the derived convergence bound. The optimization problem is further transformed as an eigenvalue optimization problem and solved by our proposed subgradient-based algorithm in a distributed fashion. In our algorithm, edge devices only need local information to obtain the optimal aggregation weights through local (D2D) communications, just like the learning itself. Therefore, the optimization, communication, and learning process can be all conducted in a distributed fashion, which leads to a genuinely distributed DFL system. Numerical results demonstrate the superiority of the proposed algorithm in practical DFL deployment.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"48 3","pages":"3899-3910"},"PeriodicalIF":18.6,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145680419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harnessing Lightweight Transformer With Contextual Synergic Enhancement for Efficient 3D Medical Image Segmentation 利用轻量级变压器与上下文协同增强有效的3D医学图像分割
IF 18.6 Pub Date : 2025-12-05 DOI: 10.1109/TPAMI.2025.3640233
Xinyu Liu;Zhen Chen;Wuyang Li;Chenxin Li;Yixuan Yuan
Transformers have shown remarkable performance in 3D medical image segmentation, but their high computational requirements and need for large amounts of labeled data limit their applicability. To address these challenges, we consider two crucial aspects: model efficiency and data efficiency. Specifically, we propose Light-UNETR, a lightweight transformer designed to achieve model efficiency. Light-UNETR features a Lightweight Dimension Reductive Attention (LIDR) module, which reduces spatial and channel dimensions while capturing both global and local features via multi-branch attention. Additionally, we introduce a Compact Gated Linear Unit (CGLU) to selectively control channel interaction with minimal parameters. Furthermore, we introduce a Contextual Synergic Enhancement (CSE) learning strategy, which aims to boost the data efficiency of Transformers. It first leverages the extrinsic contextual information to support the learning of unlabeled data with Attention-Guided Replacement, then applies Spatial Masking Consistency that utilizes intrinsic contextual information to enhance the spatial context reasoning for unlabeled data. Extensive experiments on various benchmarks demonstrate the superiority of our approach in both performance and efficiency. For example, with only 10% labeled data on the Left Atrial Segmentation dataset, our method surpasses BCP by 1.43% Jaccard while drastically reducing the FLOPs by 90.8% and parameters by 85.8%.
变形算法在三维医学图像分割中表现出了显著的性能,但其较高的计算量和对大量标记数据的需求限制了其适用性。为了应对这些挑战,我们考虑了两个关键方面:模型效率和数据效率。具体来说,我们提出了Light-UNETR,一种旨在实现模型效率的轻型变压器。Light-UNETR具有轻量级维数减少注意(LIDR)模块,可以减少空间和通道维度,同时通过多分支注意捕获全局和局部特征。此外,我们还引入了紧凑门控线性单元(CGLU),以最小参数选择性地控制通道相互作用。此外,我们引入了上下文协同增强(CSE)学习策略,旨在提高变形金刚的数据效率。它首先利用外部上下文信息来支持注意引导替代对未标记数据的学习,然后应用利用内在上下文信息的空间掩蔽一致性来增强对未标记数据的空间上下文推理。在各种基准测试上进行的大量实验证明了我们的方法在性能和效率方面的优越性。例如,在左心房分割数据集上只有10%的标记数据,我们的方法比BCP高出1.43% Jaccard,同时大幅度降低了90.8%的FLOPs和85.8%的参数。
{"title":"Harnessing Lightweight Transformer With Contextual Synergic Enhancement for Efficient 3D Medical Image Segmentation","authors":"Xinyu Liu;Zhen Chen;Wuyang Li;Chenxin Li;Yixuan Yuan","doi":"10.1109/TPAMI.2025.3640233","DOIUrl":"10.1109/TPAMI.2025.3640233","url":null,"abstract":"Transformers have shown remarkable performance in 3D medical image segmentation, but their high computational requirements and need for large amounts of labeled data limit their applicability. To address these challenges, we consider two crucial aspects: model efficiency and data efficiency. Specifically, we propose Light-UNETR, a lightweight transformer designed to achieve model efficiency. Light-UNETR features a Lightweight Dimension Reductive Attention (LIDR) module, which reduces spatial and channel dimensions while capturing both global and local features via multi-branch attention. Additionally, we introduce a Compact Gated Linear Unit (CGLU) to selectively control channel interaction with minimal parameters. Furthermore, we introduce a Contextual Synergic Enhancement (CSE) learning strategy, which aims to boost the data efficiency of Transformers. It first leverages the <italic>extrinsic contextual information</i> to support the learning of unlabeled data with Attention-Guided Replacement, then applies Spatial Masking Consistency that utilizes <italic>intrinsic contextual information</i> to enhance the spatial context reasoning for unlabeled data. Extensive experiments on various benchmarks demonstrate the superiority of our approach in both performance and efficiency. For example, with only 10% labeled data on the Left Atrial Segmentation dataset, our method surpasses BCP by 1.43% Jaccard while drastically reducing the FLOPs by 90.8% and parameters by 85.8%.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"48 3","pages":"3852-3867"},"PeriodicalIF":18.6,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145680421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Heatmap Pooling Network for Action Recognition From RGB Videos 用于RGB视频动作识别的热图池网络
IF 18.6 Pub Date : 2025-12-05 DOI: 10.1109/TPAMI.2025.3640697
Mengyuan Liu;Jinfu Liu;Yongkang Jiang;Bin He
Human action recognition (HAR) in videos has garnered widespread attention due to the rich information in RGB videos. Nevertheless, existing methods for extracting deep features from RGB videos face challenges such as information redundancy, susceptibility to noise and high storage costs. To address these issues and fully harness the useful information in videos, we propose a novel heatmap pooling network (HP-Net) for action recognition from videos, which extracts information-rich, robust and concise pooled features of the human body in videos through a feedback pooling module. The extracted pooled features demonstrate obvious performance advantages over the previously obtained pose data and heatmap features from videos. In addition, we design a spatial-motion co-learning module and a text refinement modulation module to integrate the extracted pooled features with other multimodal data, enabling more robust action recognition. Extensive experiments on several benchmarks namely NTU RGB+D 60, NTU RGB+D 120, Toyota-Smarthome and uncrewed aerial vehicles (UAV)-Human consistently verify the effectiveness of our HP-Net, which outperforms the existing human action recognition methods.
由于RGB视频中丰富的信息,视频中的人体动作识别(HAR)得到了广泛的关注。然而,现有的从RGB视频中提取深度特征的方法面临着信息冗余、易受噪声影响和存储成本高等挑战。为了解决这些问题并充分利用视频中的有用信息,我们提出了一种新的用于视频动作识别的热图池化网络(HP-Net),该网络通过反馈池化模块提取视频中信息丰富、鲁棒且简洁的人体池化特征。与先前从视频中获得的姿态数据和热图特征相比,提取的池化特征具有明显的性能优势。此外,我们设计了一个空间-运动共同学习模块和一个文本细化调制模块,将提取的池特征与其他多模态数据相结合,实现更鲁棒的动作识别。在NTU RGB+ d60, NTU RGB+ d120,丰田智能家居和无人驾驶飞行器(UAV)-Human等多个基准上进行的大量实验一致验证了我们的HP-Net的有效性,它优于现有的人类动作识别方法。
{"title":"Heatmap Pooling Network for Action Recognition From RGB Videos","authors":"Mengyuan Liu;Jinfu Liu;Yongkang Jiang;Bin He","doi":"10.1109/TPAMI.2025.3640697","DOIUrl":"10.1109/TPAMI.2025.3640697","url":null,"abstract":"Human action recognition (HAR) in videos has garnered widespread attention due to the rich information in RGB videos. Nevertheless, existing methods for extracting deep features from RGB videos face challenges such as information redundancy, susceptibility to noise and high storage costs. To address these issues and fully harness the useful information in videos, we propose a novel heatmap pooling network (HP-Net) for action recognition from videos, which extracts information-rich, robust and concise pooled features of the human body in videos through a feedback pooling module. The extracted pooled features demonstrate obvious performance advantages over the previously obtained pose data and heatmap features from videos. In addition, we design a spatial-motion co-learning module and a text refinement modulation module to integrate the extracted pooled features with other multimodal data, enabling more robust action recognition. Extensive experiments on several benchmarks namely NTU RGB+D 60, NTU RGB+D 120, Toyota-Smarthome and uncrewed aerial vehicles (UAV)-Human consistently verify the effectiveness of our HP-Net, which outperforms the existing human action recognition methods.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"48 3","pages":"3726-3743"},"PeriodicalIF":18.6,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145680418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced Spatiotemporal Consistency for Image-to-LiDAR Data Pretraining 增强图像-激光雷达数据预训练的时空一致性
IF 18.6 Pub Date : 2025-12-05 DOI: 10.1109/TPAMI.2025.3640589
Xiang Xu;Lingdong Kong;Hui Shuai;Wenwei Zhang;Liang Pan;Kai Chen;Ziwei Liu;Qingshan Liu
LiDAR representation learning has emerged as a promising approach to reducing reliance on costly and labor-intensive human annotations. While existing methods primarily focus on spatial alignment between LiDAR and camera sensors, they often overlook the temporal dynamics critical for capturing motion and scene continuity in driving scenarios. To address this limitation, we propose SuperFlow++, a novel framework that integrates spatiotemporal cues in both pretraining and downstream tasks using consecutive LiDAR-camera pairs. SuperFlow++ introduces four key components: (1) a view consistency alignment module to unify semantic information across camera views, (2) a dense-to-sparse consistency regularization mechanism to enhance feature robustness across varying point cloud densities, (3) a flow-based contrastive learning approach that models temporal relationships for improved scene understanding, and (4) a temporal voting strategy that propagates semantic information across LiDAR scans to improve prediction consistency. Extensive evaluations on 11 heterogeneous LiDAR datasets demonstrate that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions. Furthermore, by scaling both 2D and 3D backbones during pretraining, we uncover emergent properties that provide deeper insights into developing scalable 3D foundation models. With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.
激光雷达表示学习已经成为一种很有前途的方法,可以减少对昂贵和劳动密集型的人工注释的依赖。虽然现有的方法主要关注激光雷达和相机传感器之间的空间对齐,但它们往往忽略了在驾驶场景中捕捉运动和场景连续性的关键时间动力学。为了解决这一限制,我们提出了SuperFlow++,这是一个新的框架,它在使用连续激光雷达相机对的预训练和下游任务中集成了时空线索。superflow++引入了四个关键组件:(1)视图一致性对齐模块,用于统一相机视图之间的语义信息;(2)密集到稀疏的一致性正则化机制,用于增强不同点云密度之间的特征鲁棒性;(3)基于流的对比学习方法,用于建模时间关系,以提高场景理解;(4)跨激光雷达扫描传播语义信息的时间投票策略,以提高预测一致性。对11个异构LiDAR数据集的广泛评估表明,superflow++在不同任务和驾驶条件下的表现优于最先进的方法。此外,通过在预训练期间缩放2D和3D骨干,我们发现了紧急属性,为开发可扩展的3D基础模型提供了更深入的见解。superflow++具有强大的通用性和计算效率,为自动驾驶中基于激光雷达的数据高效感知建立了新的基准。
{"title":"Enhanced Spatiotemporal Consistency for Image-to-LiDAR Data Pretraining","authors":"Xiang Xu;Lingdong Kong;Hui Shuai;Wenwei Zhang;Liang Pan;Kai Chen;Ziwei Liu;Qingshan Liu","doi":"10.1109/TPAMI.2025.3640589","DOIUrl":"10.1109/TPAMI.2025.3640589","url":null,"abstract":"LiDAR representation learning has emerged as a promising approach to reducing reliance on costly and labor-intensive human annotations. While existing methods primarily focus on spatial alignment between LiDAR and camera sensors, they often overlook the temporal dynamics critical for capturing motion and scene continuity in driving scenarios. To address this limitation, we propose <bold>SuperFlow++</b>, a novel framework that integrates spatiotemporal cues in both pretraining and downstream tasks using consecutive LiDAR-camera pairs. SuperFlow++ introduces four key components: <bold>(1)</b> a view consistency alignment module to unify semantic information across camera views, <bold>(2)</b> a dense-to-sparse consistency regularization mechanism to enhance feature robustness across varying point cloud densities, <bold>(3)</b> a flow-based contrastive learning approach that models temporal relationships for improved scene understanding, and <bold>(4)</b> a temporal voting strategy that propagates semantic information across LiDAR scans to improve prediction consistency. Extensive evaluations on 11 heterogeneous LiDAR datasets demonstrate that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions. Furthermore, by scaling both 2D and 3D backbones during pretraining, we uncover emergent properties that provide deeper insights into developing scalable 3D foundation models. With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"48 3","pages":"3819-3834"},"PeriodicalIF":18.6,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145680422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Homophily Edge Augment Graph Neural Network for High-Class Homophily Variance Learning 高阶同态方差学习的同态边缘增强图神经网络
IF 18.6 Pub Date : 2025-12-05 DOI: 10.1109/TPAMI.2025.3640635
Mingjian Guang;Rui Zhang;Dawei Cheng;Xiaoyang Wang;Xin Liu;Jie Yang;Yi Ouyang;Xian Wu;Yefeng Zheng
Graph Neural Networks (GNNs) have achieved remarkable success in machine learning tasks by learning the features of graph data. However, experiments show that vanilla GNNs fail to achieve good classification performance in the field of graph anomaly detection. To address this issue, we propose and theoretically prove that the high-Class Homophily Variance (CHV) characteristic is the reason behind the suboptimal performance of GNN models in anomaly detection tasks. Statistical analysis shows that in most standard node classification datasets, homophily levels are similar across all classes, so CHV is low. In contrast, graph anomaly detection datasets have high CHV, as benign nodes are highly homophilic while anomalies are not, leading to a clear separation. To mitigate its impact, we propose a novel GNN model named Homophily Edge Augment Graph Neural Network (HEAug). Different from previous work, our method emphasizes generating new edges with low CHV value, using the original edges as an auxiliary. HEAug samples homophily adjacency matrices from scratch using a self-attention mechanism, and leverages nodes that are relevant in the feature space but not directly connected in the original graph. Additionally, we modify the loss function to punish the generation of unnecessary heterophilic edges by the model. Extensive comparison experiments demonstrate that HEAug achieved the best performance across eight benchmark datasets, including anomaly detection, edgeless node classification and adversarial attack. We also defined a heterophily attack to increase the CHV value in other graphs, demonstrating the effectiveness of our theory and model in various scenarios.
图神经网络(gnn)通过学习图数据的特征,在机器学习任务中取得了显著的成功。然而,实验表明,香草gnn在图异常检测领域没有取得良好的分类性能。为了解决这个问题,我们提出并从理论上证明了高级别同态方差(CHV)特征是GNN模型在异常检测任务中表现不佳的原因。统计分析表明,在大多数标准节点分类数据集中,所有类别的同质性水平相似,因此CHV较低。相反,图异常检测数据集具有高CHV,因为良性节点高度亲同,而异常节点则不是,从而导致明显的分离。为了减轻其影响,我们提出了一种新的GNN模型,称为同态边缘增强图神经网络(HEAug)。与以往的工作不同,我们的方法强调以原始边缘为辅助,生成低CHV值的新边缘。HEAug使用自关注机制从头开始对同态邻接矩阵进行采样,并利用在特征空间中相关但在原始图中没有直接连接的节点。此外,我们修改了损失函数,以惩罚模型产生不必要的亲异性边。大量的对比实验表明,HEAug在异常检测、无边缘节点分类和对抗性攻击等8个基准数据集上取得了最佳性能。我们还定义了异质性攻击来增加其他图中的CHV值,证明了我们的理论和模型在各种场景下的有效性。
{"title":"Homophily Edge Augment Graph Neural Network for High-Class Homophily Variance Learning","authors":"Mingjian Guang;Rui Zhang;Dawei Cheng;Xiaoyang Wang;Xin Liu;Jie Yang;Yi Ouyang;Xian Wu;Yefeng Zheng","doi":"10.1109/TPAMI.2025.3640635","DOIUrl":"10.1109/TPAMI.2025.3640635","url":null,"abstract":"Graph Neural Networks (GNNs) have achieved remarkable success in machine learning tasks by learning the features of graph data. However, experiments show that vanilla GNNs fail to achieve good classification performance in the field of graph anomaly detection. To address this issue, we propose and theoretically prove that the high-Class Homophily Variance (CHV) characteristic is the reason behind the suboptimal performance of GNN models in anomaly detection tasks. Statistical analysis shows that in most standard node classification datasets, homophily levels are similar across all classes, so CHV is low. In contrast, graph anomaly detection datasets have high CHV, as benign nodes are highly homophilic while anomalies are not, leading to a clear separation. To mitigate its impact, we propose a novel GNN model named Homophily Edge Augment Graph Neural Network (<monospace>HEAug</monospace>). Different from previous work, our method emphasizes generating new edges with low CHV value, using the original edges as an auxiliary. <monospace>HEAug</monospace> samples homophily adjacency matrices from scratch using a self-attention mechanism, and leverages nodes that are relevant in the feature space but not directly connected in the original graph. Additionally, we modify the loss function to punish the generation of unnecessary heterophilic edges by the model. Extensive comparison experiments demonstrate that <monospace>HEAug</monospace> achieved the best performance across eight benchmark datasets, including anomaly detection, edgeless node classification and adversarial attack. We also defined a heterophily attack to increase the CHV value in other graphs, demonstrating the effectiveness of our theory and model in various scenarios.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"48 3","pages":"3835-3851"},"PeriodicalIF":18.6,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145680420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on pattern analysis and machine intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1