首页 > 最新文献

IEEE transactions on pattern analysis and machine intelligence最新文献

英文 中文
GrowSP++: Growing Superpoints and Primitives for Unsupervised 3D Semantic Segmentation. growsp++:无监督3D语义分割的增长超点和原语。
IF 18.6 Pub Date : 2026-01-02 DOI: 10.1109/TPAMI.2025.3650165
Zihui Zhang, Weisheng Dai, Bing Wang, Bo Li, Bo Yang

We study the problem of 3D semantic segmentation from raw point clouds. Unlike existing methods which primarily rely on a large amount of human annotations for training neural networks, we proposes GrowSP++, an unsupervised method to successfully identify complex semantic classes for every point in 3D scenes, without needing any type of human labels. Our method is composed of three major components: 1) a feature extractor incorporating 2D-3D feature distillation, 2) a superpoint constructor featuring progressively growing superpoints, and 3) a semantic primitive constructor with an additional growing strategy. The key to our method is the superpoint constructor together with the progressive growing strategy on both super points and semantic primitives, driving the feature extractor to progressively learn similar features for 3D points belonging to the same semantic class. We extensively evaluate our method on five challenging indoor and outdoor datasets, demonstrating state of-the-art performance over all unsupervised baselines. We hope our work could inspire more advanced methods for unsupervised 3D semantic learning.

研究了原始点云的三维语义分割问题。与现有的主要依靠大量人工注释来训练神经网络的方法不同,我们提出了growsp++,一种无监督的方法,可以成功地识别3D场景中每个点的复杂语义类,而不需要任何类型的人工标签。我们的方法由三个主要部分组成:1)包含2D-3D特征蒸馏的特征提取器,2)具有逐步增长的superpoint构造器,以及3)具有附加增长策略的语义原语构造器。该方法的关键是利用超点构造函数以及超点和语义原语的渐进增长策略,驱动特征提取器对属于同一语义类的三维点逐步学习相似特征。我们在五个具有挑战性的室内和室外数据集上广泛评估了我们的方法,在所有无监督基线上展示了最先进的性能。我们希望我们的工作可以激发更先进的无监督3D语义学习方法。
{"title":"GrowSP++: Growing Superpoints and Primitives for Unsupervised 3D Semantic Segmentation.","authors":"Zihui Zhang, Weisheng Dai, Bing Wang, Bo Li, Bo Yang","doi":"10.1109/TPAMI.2025.3650165","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3650165","url":null,"abstract":"<p><p>We study the problem of 3D semantic segmentation from raw point clouds. Unlike existing methods which primarily rely on a large amount of human annotations for training neural networks, we proposes GrowSP++, an unsupervised method to successfully identify complex semantic classes for every point in 3D scenes, without needing any type of human labels. Our method is composed of three major components: 1) a feature extractor incorporating 2D-3D feature distillation, 2) a superpoint constructor featuring progressively growing superpoints, and 3) a semantic primitive constructor with an additional growing strategy. The key to our method is the superpoint constructor together with the progressive growing strategy on both super points and semantic primitives, driving the feature extractor to progressively learn similar features for 3D points belonging to the same semantic class. We extensively evaluate our method on five challenging indoor and outdoor datasets, demonstrating state of-the-art performance over all unsupervised baselines. We hope our work could inspire more advanced methods for unsupervised 3D semantic learning.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145893555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast Multi-View Discrete Clustering Via Spectral Embedding Fusion. 基于谱嵌入融合的快速多视图离散聚类。
IF 18.6 Pub Date : 2025-12-31 DOI: 10.1109/TPAMI.2025.3649521
Ben Yang, Xuetao Zhang, Zhiyuan Xue, Feiping Nie, Badong Chen

Multi-view spectral clustering (MVSC) has garnered growing interest across various real-world applications, owing to its flexibility in managing diverse data space structures. Nevertheless, the fusion of multiple $ntimes n$ similarity matrices and the separate post- discretization process hinder the utilization of MVSC in large-scale tasks, where $n$ denotes the number of samples. Moreover, noise in different similarity matrices, along with the two-stage mismatch caused by the post- discretization, results in a reduction in clustering effectiveness. To overcome these challenges, we establish a novel fast multi-view discrete clustering (FMVDC) model via spectral embedding fusion, which integrates spectral embedding matrices ($ntimes c$, $cll n$) to directly obtain discrete sample categories, where $c$ indicates the number of clusters, bypassing the need for both similarity matrix fusion and post- discretization. To further enhance clustering efficiency, we employ an anchor-based spectral embedding strategy to decrease the computational complexity of spectral analysis from cubic to linear. Since gradient descent methods are incapable of discrete models, we propose a fast optimization strategy based on the coordinate descent method to solve the FMVDC model efficiently. Extensive studies demonstrate that FMVDC significantly improves clustering performance compared to existing state-of-the-art methods, particularly in large-scale clustering tasks.

多视图光谱聚类(MVSC)由于其在管理不同数据空间结构方面的灵活性,在各种实际应用中获得了越来越多的兴趣。然而,多个$n × n$相似矩阵的融合和单独的后离散化过程阻碍了MVSC在大规模任务中的应用,其中$n$表示样本数量。此外,不同相似矩阵中的噪声以及后离散化引起的两阶段不匹配导致聚类有效性降低。为了克服这些挑战,我们通过谱嵌入融合建立了一种新的快速多视图离散聚类(FMVDC)模型,该模型集成了谱嵌入矩阵($ntimes c$, $cll n$)直接获得离散样本类别,其中$c$表示聚类的数量,从而绕过了相似性矩阵融合和后离散化的需要。为了进一步提高聚类效率,我们采用了一种基于锚点的频谱嵌入策略来降低频谱分析从三次到线性的计算复杂度。针对梯度下降法无法求解离散模型的特点,提出了一种基于坐标下降法的快速优化策略,以有效求解FMVDC模型。广泛的研究表明,与现有的最先进的方法相比,FMVDC显著提高了聚类性能,特别是在大规模聚类任务中。
{"title":"Fast Multi-View Discrete Clustering Via Spectral Embedding Fusion.","authors":"Ben Yang, Xuetao Zhang, Zhiyuan Xue, Feiping Nie, Badong Chen","doi":"10.1109/TPAMI.2025.3649521","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3649521","url":null,"abstract":"<p><p>Multi-view spectral clustering (MVSC) has garnered growing interest across various real-world applications, owing to its flexibility in managing diverse data space structures. Nevertheless, the fusion of multiple $ntimes n$ similarity matrices and the separate post- discretization process hinder the utilization of MVSC in large-scale tasks, where $n$ denotes the number of samples. Moreover, noise in different similarity matrices, along with the two-stage mismatch caused by the post- discretization, results in a reduction in clustering effectiveness. To overcome these challenges, we establish a novel fast multi-view discrete clustering (FMVDC) model via spectral embedding fusion, which integrates spectral embedding matrices ($ntimes c$, $cll n$) to directly obtain discrete sample categories, where $c$ indicates the number of clusters, bypassing the need for both similarity matrix fusion and post- discretization. To further enhance clustering efficiency, we employ an anchor-based spectral embedding strategy to decrease the computational complexity of spectral analysis from cubic to linear. Since gradient descent methods are incapable of discrete models, we propose a fast optimization strategy based on the coordinate descent method to solve the FMVDC model efficiently. Extensive studies demonstrate that FMVDC significantly improves clustering performance compared to existing state-of-the-art methods, particularly in large-scale clustering tasks.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145879835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Survey of Behavior Foundation Model: Next-Generation Whole-Body Control System of Humanoid Robots. 下一代仿人机器人全身控制系统的行为基础模型研究。
IF 18.6 Pub Date : 2025-12-30 DOI: 10.1109/TPAMI.2025.3649177
Mingqi Yuan, Tao Yu, Wenqi Ge, Xiuyong Yao, Dapeng Li, Huijiang Wang, Jiayu Chen, Bo Li, Wei Zhang, Wenjun Zeng, Hua Chen, Xin Jin

Humanoid robots are drawing significant attention as versatile platforms for complex motor control, human-robot interaction, and general-purpose physical intelligence. However, achieving efficient whole-body control (WBC) in humanoids remains a fundamental challenge due to sophisticated dynamics, underactuation, and diverse task requirements. While learning-based controllers have shown promise for complex tasks, their reliance on labor-intensive and costly retraining for new scenarios limits real-world applicability. To address these limitations, behavior(al) foundation models (BFMs) have emerged as a new paradigm that leverages large-scale pre-training to learn reusable primitive skills and broad behavioral priors, enabling zero-shot or rapid adaptation to a wide range of downstream tasks. In this paper, we present a comprehensive overview of BFMs for humanoid WBC, tracing their development across diverse pre-training pipelines. Furthermore, we discuss real-world applications, current limitations, urgent challenges, and future opportunities, positioning BFMs as a key approach toward scalable and general-purpose humanoid intelligence. Finally, we provide a curated and regularly updated collection of BFM papers and projects to facilitate further research, which is available at https://github.com/yuanmingqi/awesome-bfm-papers.

人形机器人作为复杂运动控制、人机交互和通用物理智能的通用平台,正引起人们的极大关注。然而,由于复杂的动力学、欠驱动和不同的任务要求,在类人体内实现有效的全身控制(WBC)仍然是一个根本性的挑战。虽然基于学习的控制器在复杂的任务中表现出了希望,但它们对新场景的劳动密集型和昂贵的再培训的依赖限制了它们在现实世界中的适用性。为了解决这些限制,行为基础模型(BFMs)作为一种新的范例出现了,它利用大规模的预训练来学习可重用的原始技能和广泛的行为先验,使零射击或快速适应大范围的下游任务。在本文中,我们全面概述了用于类人白细胞的bfm,追踪了它们在不同预训练管道中的发展。此外,我们讨论了现实世界的应用、当前的限制、紧迫的挑战和未来的机会,将bfm定位为可扩展和通用的类人智能的关键方法。最后,我们提供了一个精心策划并定期更新的BFM论文和项目集合,以促进进一步的研究,可在https://github.com/yuanmingqi/awesome-bfm-papers上获得。
{"title":"A Survey of Behavior Foundation Model: Next-Generation Whole-Body Control System of Humanoid Robots.","authors":"Mingqi Yuan, Tao Yu, Wenqi Ge, Xiuyong Yao, Dapeng Li, Huijiang Wang, Jiayu Chen, Bo Li, Wei Zhang, Wenjun Zeng, Hua Chen, Xin Jin","doi":"10.1109/TPAMI.2025.3649177","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3649177","url":null,"abstract":"<p><p>Humanoid robots are drawing significant attention as versatile platforms for complex motor control, human-robot interaction, and general-purpose physical intelligence. However, achieving efficient whole-body control (WBC) in humanoids remains a fundamental challenge due to sophisticated dynamics, underactuation, and diverse task requirements. While learning-based controllers have shown promise for complex tasks, their reliance on labor-intensive and costly retraining for new scenarios limits real-world applicability. To address these limitations, behavior(al) foundation models (BFMs) have emerged as a new paradigm that leverages large-scale pre-training to learn reusable primitive skills and broad behavioral priors, enabling zero-shot or rapid adaptation to a wide range of downstream tasks. In this paper, we present a comprehensive overview of BFMs for humanoid WBC, tracing their development across diverse pre-training pipelines. Furthermore, we discuss real-world applications, current limitations, urgent challenges, and future opportunities, positioning BFMs as a key approach toward scalable and general-purpose humanoid intelligence. Finally, we provide a curated and regularly updated collection of BFM papers and projects to facilitate further research, which is available at https://github.com/yuanmingqi/awesome-bfm-papers.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145866739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Transferability and Discriminability of Representation Learning in Unsupervised Domain Adaptation. 无监督领域自适应中表征学习的可转移性和可判别性研究。
IF 18.6 Pub Date : 2025-12-30 DOI: 10.1109/TPAMI.2025.3649294
Wenwen Qiang, Ziyin Gu, Lingyu Si, Jiangmeng Li, Changwen Zheng, Fuchun Sun, Hui Xiong

In this paper, we addressed the limitation of relying solely on distribution alignment and source-domain empirical risk minimization in Unsupervised Domain Adaptation (UDA). Our information-theoretic analysis showed that this standard adversarial-based framework neglects the discriminability of target-domain features, leading to suboptimal performance. To bridge this theoretical-practical gap, we defined "good representation learning" as guaranteeing both transferability and discriminability, and proved that an additional loss term targeting target-domain discriminability is necessary. Building on these insights, we proposed a novel adversarial-based UDA framework that explicitly integrates a domain alignment objective with a discriminability-enhancing constraint. Instantiated as Domain-Invariant Representation Learning with Global and Local Consistency (RLGLC), our method leverages Asymmetrically-Relaxed Wasserstein of Wasserstein Distance (AR-WWD) to address class imbalance and semantic dimension weighting, and employs a local consistency mechanism to preserve fine-grained target-domain discriminative information. Extensive experiments across multiple benchmark datasets demonstrate that RLGLC consistently surpasses state-of-the-art methods, confirming the value of our theoretical perspective and underscoring the necessity of enforcing both transferability and discriminability in adversarial-based UDA.

在本文中,我们解决了在无监督域自适应(UDA)中仅依赖分布对齐和源域经验风险最小化的局限性。我们的信息论分析表明,这种标准的基于对抗性的框架忽略了目标域特征的可辨别性,导致性能不佳。为了弥补这一理论与实践的差距,我们将“良好的表征学习”定义为保证可转移性和可判别性,并证明了额外的针对目标域可判别性的损失项是必要的。基于这些见解,我们提出了一种新的基于对抗性的UDA框架,该框架明确地将域对齐目标与增强可辨别性的约束集成在一起。该方法被实例化为具有全局和局部一致性的域不变表示学习(RLGLC),利用Wasserstein距离的不对称放松Wasserstein (AR-WWD)来解决类不平衡和语义维度加权问题,并采用局部一致性机制来保留细粒度的目标域判别信息。跨多个基准数据集的广泛实验表明,RLGLC始终优于最先进的方法,证实了我们的理论观点的价值,并强调了在基于对抗性的UDA中强制可转移性和可辨别性的必要性。
{"title":"On the Transferability and Discriminability of Representation Learning in Unsupervised Domain Adaptation.","authors":"Wenwen Qiang, Ziyin Gu, Lingyu Si, Jiangmeng Li, Changwen Zheng, Fuchun Sun, Hui Xiong","doi":"10.1109/TPAMI.2025.3649294","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3649294","url":null,"abstract":"<p><p>In this paper, we addressed the limitation of relying solely on distribution alignment and source-domain empirical risk minimization in Unsupervised Domain Adaptation (UDA). Our information-theoretic analysis showed that this standard adversarial-based framework neglects the discriminability of target-domain features, leading to suboptimal performance. To bridge this theoretical-practical gap, we defined \"good representation learning\" as guaranteeing both transferability and discriminability, and proved that an additional loss term targeting target-domain discriminability is necessary. Building on these insights, we proposed a novel adversarial-based UDA framework that explicitly integrates a domain alignment objective with a discriminability-enhancing constraint. Instantiated as Domain-Invariant Representation Learning with Global and Local Consistency (RLGLC), our method leverages Asymmetrically-Relaxed Wasserstein of Wasserstein Distance (AR-WWD) to address class imbalance and semantic dimension weighting, and employs a local consistency mechanism to preserve fine-grained target-domain discriminative information. Extensive experiments across multiple benchmark datasets demonstrate that RLGLC consistently surpasses state-of-the-art methods, confirming the value of our theoretical perspective and underscoring the necessity of enforcing both transferability and discriminability in adversarial-based UDA.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145866690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CrossEarth: Geospatial Vision Foundation Model for Domain Generalizable Remote Sensing Semantic Segmentation. 跨域遥感语义分割的地理空间视觉基础模型。
IF 18.6 Pub Date : 2025-12-29 DOI: 10.1109/TPAMI.2025.3649001
Ziyang Gong, Zhixiang Wei, Di Wang, Xiaoxing Hu, Xianzheng Ma, Hongruixuan Chen, Yuru Jia, Yupeng Deng, Zhenming Ji, Xiangwei Zhu, Xue Yang, Naoto Yokoya, Jing Zhang, Bo Du, Junchi Yan, Liangpei Zhang

Due to the substantial domain gaps in Remote Sensing (RS) images that are characterized by variabilities such as location, wavelength, and sensor type, Remote Sensing Domain Generalization (RSDG) has emerged as a critical and valuable research frontier, focusing on developing models that generalize effectively across diverse scenarios. However, research in this area remains underexplored: (1) Current cross-domain methods primarily focus on Domain Adaptation (DA), which adapts models to predefined domains rather than to unseen ones; (2) Few studies target the RSDG issue, especially for semantic segmentation tasks. Existing related models are developed for specific unknown domains, struggling with issues of underfitting on other unseen scenarios; (3) Existing RS foundation models tend to prioritize in-domain performance over cross-domain generalization. To this end, we introduce the first vision foundation model for RSDG semantic segmentation, CrossEarth. CrossEarth demonstrates strong cross-domain generalization through a specially designed data-level Earth-Style Injection pipeline and a model-level Multi-Task Training pipeline. In addition, for the semantic segmentation task, we have curated an RSDG benchmark comprising 32 semantic segmentation scenarios across various regions, spectral bands, platforms, and climates, providing comprehensive evaluations of the generalizability of future RSDG models. Extensive experiments on this collection demonstrate the superiority of CrossEarth over existing state-of-the-art methods.

由于遥感(RS)图像中存在大量由位置、波长和传感器类型等变量表征的域缺口,遥感域概化(RSDG)已成为一个关键和有价值的研究前沿,重点是开发在不同场景下有效概化的模型。(1)当前的跨域方法主要集中在域自适应(DA)上,它使模型适应预定义的域,而不是不可见的域;(2)针对RSDG问题,特别是语义分割任务的研究很少。现有的相关模型是针对特定的未知领域开发的,在其他未知场景下存在拟合不足的问题;(3)现有RS基础模型倾向于优先考虑域内性能而不是跨域泛化。为此,我们引入了第一个用于RSDG语义分割的视觉基础模型——CrossEarth。通过特别设计的数据级地球式注入管道和模型级多任务训练管道,CrossEarth展示了强大的跨域泛化。此外,对于语义分割任务,我们还设计了一个RSDG基准测试,该测试包含32个不同区域、光谱带、平台和气候的语义分割场景,全面评估了未来RSDG模型的泛化性。在这个集合上进行的大量实验证明了CrossEarth优于现有的最先进的方法。
{"title":"CrossEarth: Geospatial Vision Foundation Model for Domain Generalizable Remote Sensing Semantic Segmentation.","authors":"Ziyang Gong, Zhixiang Wei, Di Wang, Xiaoxing Hu, Xianzheng Ma, Hongruixuan Chen, Yuru Jia, Yupeng Deng, Zhenming Ji, Xiangwei Zhu, Xue Yang, Naoto Yokoya, Jing Zhang, Bo Du, Junchi Yan, Liangpei Zhang","doi":"10.1109/TPAMI.2025.3649001","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3649001","url":null,"abstract":"<p><p>Due to the substantial domain gaps in Remote Sensing (RS) images that are characterized by variabilities such as location, wavelength, and sensor type, Remote Sensing Domain Generalization (RSDG) has emerged as a critical and valuable research frontier, focusing on developing models that generalize effectively across diverse scenarios. However, research in this area remains underexplored: (1) Current cross-domain methods primarily focus on Domain Adaptation (DA), which adapts models to predefined domains rather than to unseen ones; (2) Few studies target the RSDG issue, especially for semantic segmentation tasks. Existing related models are developed for specific unknown domains, struggling with issues of underfitting on other unseen scenarios; (3) Existing RS foundation models tend to prioritize in-domain performance over cross-domain generalization. To this end, we introduce the first vision foundation model for RSDG semantic segmentation, CrossEarth. CrossEarth demonstrates strong cross-domain generalization through a specially designed data-level Earth-Style Injection pipeline and a model-level Multi-Task Training pipeline. In addition, for the semantic segmentation task, we have curated an RSDG benchmark comprising 32 semantic segmentation scenarios across various regions, spectral bands, platforms, and climates, providing comprehensive evaluations of the generalizability of future RSDG models. Extensive experiments on this collection demonstrate the superiority of CrossEarth over existing state-of-the-art methods.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145859653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bi-C2R: Bidirectional Continual Compatible Representation for Re-Indexing Free Lifelong Person Re-Identification. 双向c2r:双向连续兼容的再标引方式。
IF 18.6 Pub Date : 2025-12-29 DOI: 10.1109/TPAMI.2025.3649078
Zhenyu Cui, Jiahuan Zhou, Yuxin Peng

Lifelong person Re-IDentification (L-ReID) exploits sequentially collected data to continuously train and update a ReID model, focusing on the overall performance of all data. Its main challenge is to avoid the catastrophic forgetting problem of old knowledge while training on new data. Existing L-ReID methods typically re-extract new features for all historical gallery images for inference after each update, known as "re-indexing". However, historical gallery data typically suffers from direct saving due to the data privacy issue and the high re-indexing costs for large-scale gallery images. As a result, it inevitably leads to incompatible retrieval between query features extracted by the updated model and gallery features extracted by those before the update, greatly impairing the re-identification performance. To tackle the above issue, this paper focuses on a new task called Re-index Free Lifelong person Re-IDentification (RFL-ReID), which requires performing lifelong person re-identification without re-indexing historical gallery images. Therefore, RFL-ReID is more challenging than L-ReID, requiring continuous learning and balancing new and old knowledge in diverse streaming data, and making the features output by the new and old models compatible with each other. To this end, we propose a Bidirectional Continuous Compatible Representation (Bi-C2R) framework to continuously update the gallery features extracted by the old model to perform efficient L-ReID in a compatible manner. Specifically, a bidirectional compatible transfer network is first designed to bridge the relationship between new and old knowledge and continuously update the old gallery features to the new feature space after the updating. Secondly, a bidirectional compatible distillation module and a bidirectional anti-forgetting distillation model are designed to balance the compatibility between the new and old knowledge in dual feature spaces. Finally, a feature-level exponential moving average strategy is designed to adaptively fill the diverse knowledge gaps between different data domains. Finally, we verify our proposed Bi-C2R method through theoretical analysis and extensive experiments on multiple benchmarks, which demonstrate that the proposed method can achieve leading performance on both the introduced RFL-ReID task and the traditional L-ReID task.

终身人再识别(L-ReID)利用顺序收集的数据不断训练和更新ReID模型,关注所有数据的整体性能。它的主要挑战是在训练新数据时避免旧知识的灾难性遗忘问题。现有的L-ReID方法通常在每次更新后重新提取所有历史图库图像的新特征以进行推理,称为“重新索引”。然而,由于数据隐私问题和大规模图库图像的高重新索引成本,历史图库数据通常无法直接保存。这就不可避免地导致更新模型后提取的查询特征与更新前提取的库特征检索不兼容,极大地影响了重新识别的性能。为了解决上述问题,本文重点研究了一个新的任务,称为重新索引免费终身人物重新识别(RFL-ReID),该任务要求在不重新索引历史图库图像的情况下进行终身人物重新识别。因此,RFL-ReID比L-ReID更具挑战性,需要在多样化的流数据中不断学习和平衡新旧知识,并使新旧模型输出的特征相互兼容。为此,我们提出了一个双向连续兼容表示(Bi-C2R)框架,以不断更新旧模型提取的图库特征,以兼容的方式执行高效的L-ReID。具体而言,首先设计一个双向兼容的传递网络,架起新旧知识之间的桥梁,并在更新后不断将旧的图库特征更新到新的特征空间。其次,设计双向兼容蒸馏模块和双向防遗忘蒸馏模型,平衡新旧知识在对偶特征空间中的兼容性;最后,设计了一种特征级指数移动平均策略,自适应地填补了不同数据域之间的知识缺口。最后,我们通过理论分析和多个基准的大量实验验证了所提出的Bi-C2R方法,结果表明所提出的方法在引入的RFL-ReID任务和传统的L-ReID任务上都能取得领先的性能。
{"title":"Bi-C<sup>2</sup>R: Bidirectional Continual Compatible Representation for Re-Indexing Free Lifelong Person Re-Identification.","authors":"Zhenyu Cui, Jiahuan Zhou, Yuxin Peng","doi":"10.1109/TPAMI.2025.3649078","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3649078","url":null,"abstract":"<p><p>Lifelong person Re-IDentification (L-ReID) exploits sequentially collected data to continuously train and update a ReID model, focusing on the overall performance of all data. Its main challenge is to avoid the catastrophic forgetting problem of old knowledge while training on new data. Existing L-ReID methods typically re-extract new features for all historical gallery images for inference after each update, known as \"re-indexing\". However, historical gallery data typically suffers from direct saving due to the data privacy issue and the high re-indexing costs for large-scale gallery images. As a result, it inevitably leads to incompatible retrieval between query features extracted by the updated model and gallery features extracted by those before the update, greatly impairing the re-identification performance. To tackle the above issue, this paper focuses on a new task called Re-index Free Lifelong person Re-IDentification (RFL-ReID), which requires performing lifelong person re-identification without re-indexing historical gallery images. Therefore, RFL-ReID is more challenging than L-ReID, requiring continuous learning and balancing new and old knowledge in diverse streaming data, and making the features output by the new and old models compatible with each other. To this end, we propose a Bidirectional Continuous Compatible Representation (Bi-C<sup>2</sup>R) framework to continuously update the gallery features extracted by the old model to perform efficient L-ReID in a compatible manner. Specifically, a bidirectional compatible transfer network is first designed to bridge the relationship between new and old knowledge and continuously update the old gallery features to the new feature space after the updating. Secondly, a bidirectional compatible distillation module and a bidirectional anti-forgetting distillation model are designed to balance the compatibility between the new and old knowledge in dual feature spaces. Finally, a feature-level exponential moving average strategy is designed to adaptively fill the diverse knowledge gaps between different data domains. Finally, we verify our proposed Bi-C<sup>2</sup>R method through theoretical analysis and extensive experiments on multiple benchmarks, which demonstrate that the proposed method can achieve leading performance on both the introduced RFL-ReID task and the traditional L-ReID task.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145859668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Continuous Review and Timely Correction: Enhancing the Resistance to Noisy Labels Via Self-Not-True and Class-Wise Distillation. 持续回顾及时修正:通过自我不真实和类别明智的蒸馏增强对噪声标签的抵抗力。
IF 18.6 Pub Date : 2025-12-29 DOI: 10.1109/TPAMI.2025.3649111
Long Lan, Jingyi Wang, Xinghao Wu, Bo Han, Xinwang Liu

Deep neural networks possess remarkable learning capabilities and expressive power, but this makes them vulnerable to overfitting, especially when they encounter mislabeled data. A notable phenomenon called the memorization effect occurs when networks first learn the correctly labeled data and later memorize the mislabeled instances. While early stopping can mitigate overfitting, it doesn't entirely prevent networks from adapting to incorrect labels during the initial training phases, which can result in losing valuable insights from accurate data. Moreover, early stopping cannot rectify the mistakes caused by mislabeled inputs, underscoring the need for improved strategies. In this paper, we introduce an innovative mechanism for continuous review and timely correction of learned knowledge. Our approach allows the network to repeatedly revisit and reinforce correct information while promptly addressing any inaccuracies stemming from mislabeled data. We present a novel method called self-not-true-distillation (SNTD). This technique employs self-distillation, where the network from previous training iterations acts as a teacher, guiding the current network to review and solidify its understanding of accurate labels. Crucially, SNTD masks the true class label in the logits during this process, concentrating on the non-true classes to correct any erroneous knowledge that may have been acquired. We also recognize that different data classes follow distinct learning trajectories. A single teacher network might struggle to effectively guide the learning of all classes at once, which necessitates selecting different teacher networks for each specific class. Additionally, the influence of the teacher network's guidance varies throughout the training process. To address these challenges, we propose SNTD+, which integrates a class-wise distillation strategy along with a dynamic weight adjustment mechanism. Together, these enhancements significantly bolster SNTD's robustness in tackling complex scenarios characterized by label noise.

深度神经网络具有卓越的学习能力和表达能力,但这使得它们容易受到过拟合的影响,特别是当它们遇到错误标记的数据时。当网络首先学习正确标记的数据,然后记忆错误标记的实例时,会出现一个值得注意的现象,称为记忆效应。虽然早期停止可以减轻过拟合,但它并不能完全防止网络在初始训练阶段适应错误的标签,这可能导致失去准确数据中有价值的见解。此外,早期停止并不能纠正错误标示的投入所造成的错误,强调需要改进战略。在本文中,我们介绍了一种创新的机制,用于不断审查和及时纠正所学知识。我们的方法允许网络反复访问和强化正确的信息,同时迅速解决因错误标记数据而产生的任何不准确信息。提出了一种新的自非真蒸馏(SNTD)方法。该技术采用自蒸馏,其中来自先前训练迭代的网络充当教师,指导当前网络检查并巩固其对准确标签的理解。至关重要的是,SNTD在此过程中掩盖了逻辑中的真实类标签,专注于非真实类,以纠正可能获得的任何错误知识。我们还认识到,不同的数据类遵循不同的学习轨迹。单一的教师网络可能难以同时有效地指导所有班级的学习,这就需要为每个特定的班级选择不同的教师网络。此外,教师网络指导的影响在整个培训过程中是不同的。为了解决这些挑战,我们提出了SNTD+,它集成了类智能蒸馏策略和动态权重调整机制。总之,这些增强显著增强了SNTD在处理以标签噪声为特征的复杂场景中的鲁棒性。
{"title":"Continuous Review and Timely Correction: Enhancing the Resistance to Noisy Labels Via Self-Not-True and Class-Wise Distillation.","authors":"Long Lan, Jingyi Wang, Xinghao Wu, Bo Han, Xinwang Liu","doi":"10.1109/TPAMI.2025.3649111","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3649111","url":null,"abstract":"<p><p>Deep neural networks possess remarkable learning capabilities and expressive power, but this makes them vulnerable to overfitting, especially when they encounter mislabeled data. A notable phenomenon called the memorization effect occurs when networks first learn the correctly labeled data and later memorize the mislabeled instances. While early stopping can mitigate overfitting, it doesn't entirely prevent networks from adapting to incorrect labels during the initial training phases, which can result in losing valuable insights from accurate data. Moreover, early stopping cannot rectify the mistakes caused by mislabeled inputs, underscoring the need for improved strategies. In this paper, we introduce an innovative mechanism for continuous review and timely correction of learned knowledge. Our approach allows the network to repeatedly revisit and reinforce correct information while promptly addressing any inaccuracies stemming from mislabeled data. We present a novel method called self-not-true-distillation (SNTD). This technique employs self-distillation, where the network from previous training iterations acts as a teacher, guiding the current network to review and solidify its understanding of accurate labels. Crucially, SNTD masks the true class label in the logits during this process, concentrating on the non-true classes to correct any erroneous knowledge that may have been acquired. We also recognize that different data classes follow distinct learning trajectories. A single teacher network might struggle to effectively guide the learning of all classes at once, which necessitates selecting different teacher networks for each specific class. Additionally, the influence of the teacher network's guidance varies throughout the training process. To address these challenges, we propose SNTD+, which integrates a class-wise distillation strategy along with a dynamic weight adjustment mechanism. Together, these enhancements significantly bolster SNTD's robustness in tackling complex scenarios characterized by label noise.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145859713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spike Camera Optical Flow Estimation Based on Continuous Spike Streams. 基于连续尖峰流的尖峰相机光流估计。
IF 18.6 Pub Date : 2025-12-29 DOI: 10.1109/TPAMI.2025.3649050
Rui Zhao, Ruiqin Xiong, Dongkai Wang, Shiyu Xuan, Jian Zhang, Xiaopeng Fan, Tiejun Huang

Spike camera is an emerging bio-inspired vision sensor with ultra-high temporal resolution. It records scenes by accumulating photons and outputting binary spike streams. Optical flow estimation aims to estimate pixel-level correspondences between different moments, describing motion information along time, which is a key task of spike camera. High-quality optical flow is important since motion information is a foundation for analyzing spikes. However, extracting stable light-intensity information from spikes is difficult due to the randomness of binary spikes. Besides, the continuity of spikes can offer contextual information for optical flow. In this paper, we propose a network Spike2Flow++ to estimate optical flow for spike camera. In Spike2Flow++, we propose a differential of spike firing time (DSFT) to represent information in binary spikes. Moreover, we propose a dual DSFT representation and a dual correlation construction to extract stable light-intensity information for reliable correlations. To use the continuity of spikes as motion contextual information, we propose a joint correlation decoding (JCD) that jointly estimates a series of flow fields. To adaptively fuse different motions in JCD, we propose a global motion bank aggregation to construct an information bank for all motions and adaptively extract contexts from the bank for each iteration during recurrent decoding of each motion. To train and evaluate our network, we construct a real scene with spikes and flow++ (RSSF++) based on real-world scenes. Experiments demonstrate that our Spike2Flow++ achieves state-of-the-art performance on RSSF++, photo-realistic high-speed motion (PHM), and real-captured data.

Spike相机是一种新兴的生物视觉传感器,具有超高的时间分辨率。它通过积累光子和输出二进制尖峰流来记录场景。光流估计的目的是估计不同时刻之间像素级的对应关系,描述运动信息随时间的变化,这是尖峰相机的关键任务。高质量的光流很重要,因为运动信息是分析尖峰的基础。然而,由于二进制尖峰的随机性,从尖峰中提取稳定的光强信息是困难的。此外,尖峰的连续性可以为光流提供上下文信息。在本文中,我们提出了一个spike2flow++网络来估计尖峰相机的光流。在spike2flow++中,我们提出了一个尖峰触发时间的微分(DSFT)来表示二进制尖峰中的信息。此外,我们提出了一种双DSFT表示和双相关结构来提取稳定的光强信息,以获得可靠的相关性。为了利用峰值的连续性作为运动上下文信息,我们提出了一种联合相关解码(JCD)方法,该方法可以联合估计一系列流场。为了在JCD中自适应融合不同的运动,我们提出了一种全局运动库聚合方法来构建所有运动的信息库,并在每个运动的循环解码过程中自适应地从信息库中提取每次迭代的上下文。为了训练和评估我们的网络,我们基于真实场景构建了一个具有峰值和流量的真实场景(rssf++)。实验表明,我们的spike2flow++在rssf++、逼真的高速运动(PHM)和实时捕获数据上实现了最先进的性能。
{"title":"Spike Camera Optical Flow Estimation Based on Continuous Spike Streams.","authors":"Rui Zhao, Ruiqin Xiong, Dongkai Wang, Shiyu Xuan, Jian Zhang, Xiaopeng Fan, Tiejun Huang","doi":"10.1109/TPAMI.2025.3649050","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3649050","url":null,"abstract":"<p><p>Spike camera is an emerging bio-inspired vision sensor with ultra-high temporal resolution. It records scenes by accumulating photons and outputting binary spike streams. Optical flow estimation aims to estimate pixel-level correspondences between different moments, describing motion information along time, which is a key task of spike camera. High-quality optical flow is important since motion information is a foundation for analyzing spikes. However, extracting stable light-intensity information from spikes is difficult due to the randomness of binary spikes. Besides, the continuity of spikes can offer contextual information for optical flow. In this paper, we propose a network Spike2Flow++ to estimate optical flow for spike camera. In Spike2Flow++, we propose a differential of spike firing time (DSFT) to represent information in binary spikes. Moreover, we propose a dual DSFT representation and a dual correlation construction to extract stable light-intensity information for reliable correlations. To use the continuity of spikes as motion contextual information, we propose a joint correlation decoding (JCD) that jointly estimates a series of flow fields. To adaptively fuse different motions in JCD, we propose a global motion bank aggregation to construct an information bank for all motions and adaptively extract contexts from the bank for each iteration during recurrent decoding of each motion. To train and evaluate our network, we construct a real scene with spikes and flow++ (RSSF++) based on real-world scenes. Experiments demonstrate that our Spike2Flow++ achieves state-of-the-art performance on RSSF++, photo-realistic high-speed motion (PHM), and real-captured data.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145859059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural Eigenfunctions Are Structured Representation Learners. 神经特征函数是结构化表征学习器。
IF 18.6 Pub Date : 2025-10-27 DOI: 10.1109/TPAMI.2025.3625728
Zhijie Deng, Jiaxin Shi, Hao Zhang, Peng Cui, Cewu Lu, Jun Zhu

This paper revisits the canonical concept of learning structured representations without label supervision by eigendecomposition. Yet, unlike prior spectral methods such as Laplacian Eigenmap which operate in a nonparametric manner, we aim to parametrically model the principal eigenfunctions of an integral operator defined by a kernel and a data distribution using a neural network for enhanced scalability and reasonable out-of-sample generalization. To achieve this goal, we first present a new series of objective functions that generalize the EigenGame [1] to function space for learning neural eigenfunctions. We then show that, when the similarity metric is derived from positive relations in a data augmentation setup, a representation learning objective function that resembles those of popular self-supervised learning methods emerges, with an additional symmetry-breaking property for producing structured representations where features are ordered by importance. We call such a structured, adaptive-length deep representation Neural Eigenmap. We demonstrate using Neural Eigenmap as adaptive-length codes in image retrieval systems. By truncation according to feature importance, our method requires up to $16times$ shorter representation length than leading self-supervised learning ones to achieve similar retrieval performance. We further apply our method to graph data and report strong results on a node representation learning benchmark with more than one million nodes.

本文回顾了通过特征分解学习无标签监督的结构化表示的规范概念。然而,与先前的频谱方法(如拉普拉斯特征映射)以非参数方式操作不同,我们的目标是使用神经网络对由核和数据分布定义的积分算子的主特征函数进行参数化建模,以增强可扩展性和合理的样本外泛化。为了实现这一目标,我们首先提出了一系列新的目标函数,将EigenGame[1]推广到用于学习神经特征函数的函数空间。然后,我们表明,当相似性度量从数据增强设置中的正关系中导出时,出现了类似于流行的自监督学习方法的表征学习目标函数,并具有额外的对称性破坏性质,用于生成结构化表征,其中特征按重要性排序。我们称这种结构化的、自适应长度的深度表示为神经特征图。我们演示了在图像检索系统中使用神经特征映射作为自适应长度代码。通过根据特征重要性进行截断,我们的方法比领先的自监督学习方法所需的表示长度缩短了16倍,以达到相似的检索性能。我们进一步将我们的方法应用于图数据,并在超过一百万个节点的节点表示学习基准上报告了强有力的结果。
{"title":"Neural Eigenfunctions Are Structured Representation Learners.","authors":"Zhijie Deng, Jiaxin Shi, Hao Zhang, Peng Cui, Cewu Lu, Jun Zhu","doi":"10.1109/TPAMI.2025.3625728","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3625728","url":null,"abstract":"<p><p>This paper revisits the canonical concept of learning structured representations without label supervision by eigendecomposition. Yet, unlike prior spectral methods such as Laplacian Eigenmap which operate in a nonparametric manner, we aim to parametrically model the principal eigenfunctions of an integral operator defined by a kernel and a data distribution using a neural network for enhanced scalability and reasonable out-of-sample generalization. To achieve this goal, we first present a new series of objective functions that generalize the EigenGame [1] to function space for learning neural eigenfunctions. We then show that, when the similarity metric is derived from positive relations in a data augmentation setup, a representation learning objective function that resembles those of popular self-supervised learning methods emerges, with an additional symmetry-breaking property for producing structured representations where features are ordered by importance. We call such a structured, adaptive-length deep representation Neural Eigenmap. We demonstrate using Neural Eigenmap as adaptive-length codes in image retrieval systems. By truncation according to feature importance, our method requires up to $16times$ shorter representation length than leading self-supervised learning ones to achieve similar retrieval performance. We further apply our method to graph data and report strong results on a node representation learning benchmark with more than one million nodes.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145380479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Affine Correspondences Between Multi-Camera Systems for Relative Pose Estimation 用于相对姿态估计的多相机系统间的仿射对应。
IF 18.6 Pub Date : 2025-10-27 DOI: 10.1109/TPAMI.2025.3626134
Banglei Guan;Ji Zhao
We present a novel method to compute the relative pose of multi-camera systems using two affine correspondences (ACs). Existing solutions to the multi-camera relative pose estimation are either restricted to special cases of motion, have too high computational complexity, or require too many point correspondences (PCs). Thus, these solvers impede an efficient or accurate relative pose estimation when applying RANSAC as a robust estimator. This paper shows that the 6DOF relative pose estimation problem using ACs permits a feasible minimal solution, when exploiting the geometric constraints between ACs and multi-camera systems using a special parameterization. We present a problem formulation based on two ACs that encompass two common types of ACs across two views, i.e., inter-camera and intra-camera. Moreover, we exploit a unified and versatile framework for generating 6DOF solvers. Building upon this foundation, we use this framework to address two categories of practical scenarios. First, for the more challenging 7DOF relative pose estimation problem—where the scale transformation of multi-camera systems is unknown—we propose 7DOF solvers to compute the relative pose and scale using three ACs. Second, leveraging inertial measurement units (IMUs), we introduce several minimal solvers for constrained relative pose estimation problems. These include 5DOF solvers with known relative rotation angle, and 4DOF solver with known vertical direction. Experiments on both virtual and real multi-camera systems prove that the proposed solvers are more efficient than the state-of-the-art algorithms, while resulting in a better relative pose accuracy.
提出了一种利用两个仿射对应(ACs)计算多相机系统相对位姿的新方法。现有的多相机相对姿态估计方法要么局限于运动的特殊情况,要么计算复杂度太高,要么需要太多的点对应(pc)。因此,当使用RANSAC作为鲁棒估计器时,这些解算器阻碍了有效或准确的相对姿态估计。本文表明,当使用特殊的参数化方法利用ACs和多相机系统之间的几何约束时,使用ACs的6自由度相对姿态估计问题允许一个可行的最小解。我们提出了一个基于两个ac的问题公式,该ac包含跨两个视图的两种常见类型的ac,即相机间和相机内。此外,我们还开发了一个统一的通用框架来生成6自由度求解器。在此基础上,我们使用此框架来处理两类实际场景。首先,对于更具挑战性的7DOF相对姿态估计问题(其中多相机系统的尺度变换未知),我们提出了使用三个ac计算相对姿态和尺度的7DOF解算器。其次,利用惯性测量单元(imu),我们引入了约束相对姿态估计问题的几个最小解。这包括已知相对旋转角的5DOF解算器和已知垂直方向的4DOF解算器。在虚拟和真实多相机系统上的实验证明,该算法比现有算法更有效,同时产生了更好的相对姿态精度。源代码可从https://github.com/jizhaox/relpose-mcs-depth获得。
{"title":"Affine Correspondences Between Multi-Camera Systems for Relative Pose Estimation","authors":"Banglei Guan;Ji Zhao","doi":"10.1109/TPAMI.2025.3626134","DOIUrl":"10.1109/TPAMI.2025.3626134","url":null,"abstract":"We present a novel method to compute the relative pose of multi-camera systems using two affine correspondences (ACs). Existing solutions to the multi-camera relative pose estimation are either restricted to special cases of motion, have too high computational complexity, or require too many point correspondences (PCs). Thus, these solvers impede an efficient or accurate relative pose estimation when applying RANSAC as a robust estimator. This paper shows that the 6DOF relative pose estimation problem using ACs permits a feasible minimal solution, when exploiting the geometric constraints between ACs and multi-camera systems using a special parameterization. We present a problem formulation based on two ACs that encompass two common types of ACs across two views, i.e., inter-camera and intra-camera. Moreover, we exploit a unified and versatile framework for generating 6DOF solvers. Building upon this foundation, we use this framework to address two categories of practical scenarios. First, for the more challenging 7DOF relative pose estimation problem—where the scale transformation of multi-camera systems is unknown—we propose 7DOF solvers to compute the relative pose and scale using three ACs. Second, leveraging inertial measurement units (IMUs), we introduce several minimal solvers for constrained relative pose estimation problems. These include 5DOF solvers with known relative rotation angle, and 4DOF solver with known vertical direction. Experiments on both virtual and real multi-camera systems prove that the proposed solvers are more efficient than the state-of-the-art algorithms, while resulting in a better relative pose accuracy.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"48 2","pages":"2012-2029"},"PeriodicalIF":18.6,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145380504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on pattern analysis and machine intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1