IEEE Transactions on Pattern Analysis and Machine Intelligence最新文献

英文中文

Interpretable Subspace Clustering 可解释子空间聚类

IF 23.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence

Pub Date : 2026-01-15 DOI: 10.1109/tpami.2026.3653776

Zheng Zhang, Peng Zhou, Aiting Yao, Liang Du, Xinwang Liu

引用次数: 0

Semantic Contrast for Domain-Robust Underwater Image Quality Assessment. 基于语义对比的域鲁棒水下图像质量评估。

IF 23.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence

Pub Date : 2026-01-15 DOI: 10.1109/tpami.2026.3654426

Jingchun Zhou,Chunjiang Liu,Qiuping Jiang,Xianping Fu,Junhui Hou,Xuelong Li

Underwater image quality assessment (UIQA) is hindered by complex degradation and domain shifts across aquatic environments. Existing no-reference IQA methods rely on costly and subjective mean opinion scores (MOS), which limit their generalization to unseen domains. To overcome these challenges, we propose SCUIA, an unsupervised UIQA framework leveraging semantic contrastive learning for quality prediction without human annotations. Specifically, we introduce a vision-language contrastive learning strategy that aligns image features with textual embeddings in a unified semantic space, capturing implicit degradation-quality correlations. We further enhance quality discrimination with a hierarchical contrastive learning mechanism that combines image-specific statistical priors and semantic prompts. A triplet-based inter-group contrastive loss explicitly models relative quality relationships. To tackle cross-domain variations, we develop an unsupervised domain adaptation module that uses local statistical features to guide CLIP fine-tuning to disentangle domain-invariant quality representations from domain-specific noise. This enables zero-shot cross-domain quality prediction without labeled data. Extensive experiments on public UIQA benchmarks demonstrate significant improvements over existing methods, highlighting superior generalization and domain adaptability.

水下图像质量评估（UIQA）受到水体环境复杂退化和域转移的阻碍。现有的无参考IQA方法依赖于昂贵且主观的平均意见分数（MOS），这限制了它们在未知领域的泛化。为了克服这些挑战，我们提出了SCUIA，这是一个无监督的UIQA框架，利用语义对比学习进行质量预测，无需人工注释。具体来说，我们引入了一种视觉语言对比学习策略，该策略将图像特征与统一语义空间中的文本嵌入对齐，捕获隐含的退化-质量相关性。我们通过结合图像特定统计先验和语义提示的分层对比学习机制进一步增强了质量判别。基于三重的组间对比损失明确地模拟了相对质量关系。为了解决跨领域的变化，我们开发了一个无监督的领域自适应模块，该模块使用局部统计特征来指导CLIP微调，以从特定领域的噪声中分离出领域不变的质量表示。这使得零射击跨域质量预测没有标记的数据。在公共UIQA基准上进行的大量实验表明，与现有方法相比，有了显著的改进，突出了卓越的泛化和领域适应性。

{"title":"Semantic Contrast for Domain-Robust Underwater Image Quality Assessment.","authors":"Jingchun Zhou,Chunjiang Liu,Qiuping Jiang,Xianping Fu,Junhui Hou,Xuelong Li","doi":"10.1109/tpami.2026.3654426","DOIUrl":"https://doi.org/10.1109/tpami.2026.3654426","url":null,"abstract":"Underwater image quality assessment (UIQA) is hindered by complex degradation and domain shifts across aquatic environments. Existing no-reference IQA methods rely on costly and subjective mean opinion scores (MOS), which limit their generalization to unseen domains. To overcome these challenges, we propose SCUIA, an unsupervised UIQA framework leveraging semantic contrastive learning for quality prediction without human annotations. Specifically, we introduce a vision-language contrastive learning strategy that aligns image features with textual embeddings in a unified semantic space, capturing implicit degradation-quality correlations. We further enhance quality discrimination with a hierarchical contrastive learning mechanism that combines image-specific statistical priors and semantic prompts. A triplet-based inter-group contrastive loss explicitly models relative quality relationships. To tackle cross-domain variations, we develop an unsupervised domain adaptation module that uses local statistical features to guide CLIP fine-tuning to disentangle domain-invariant quality representations from domain-specific noise. This enables zero-shot cross-domain quality prediction without labeled data. Extensive experiments on public UIQA benchmarks demonstrate significant improvements over existing methods, highlighting superior generalization and domain adaptability.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"29 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Enhanced Representation Learning for Single-Source Domain Generalization in LiDAR Semantic Segmentation. 激光雷达语义分割中单源域泛化的增强表示学习。

IF 23.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence

Pub Date : 2026-01-15 DOI: 10.1109/tpami.2026.3654352

Hyeonseong Kim,Yoonsu Kang,Changgyoon Oh,Kuk-Jin Yoon

With the success of the 3D deep learning models, various perception technologies for autonomous driving have been developed in the LiDAR domain. While these models perform well in the trained source domain, they struggle in unseen domains with a domain gap. In this paper, we propose a representation learning approach for domain generalization in LiDAR semantic segmentation, termed DGLSS++, which is designed to ensure robust performance in both the source domain and unseen domains despite training exclusively on the source domain. Our approach focuses on generalizing from a single source domain, addressing the domain shift caused by variations in LiDAR sensor configurations and scene distributions. To tackle both sparse-to-dense and dense-to-sparse generalization scenarios, we simulate unseen domains by generating sparsely and densely augmented domains. With the augmented domain, we introduce two constraints for generalizable representation learning: generalized masked sparsity invariant feature consistency (GMSIFC) and localized semantic correlation consistency (LSCC). GMSIFC aligns the internal sparse features of the source domain with those of the augmented domain at different sparsity, introducing a novel masking strategy to exclude voxel features associated with multiple inconsistent classes. For LSCC, class prototypes from spatially local regions are constrained to maintain similar correlations across all local regions, regardless of the scene or domain. In addition, we establish standardized training and evaluation protocols utilizing four real-world datasets and implement several baseline methods. Extensive experiments demonstrate our approach outperforms both UDA and DG baselines. The code is available at https://github.com/gzgzys9887/DGLSS.

随着3D深度学习模型的成功，各种自动驾驶感知技术在激光雷达领域得到了发展。虽然这些模型在训练好的源域中表现良好，但它们在具有域间隙的未知域中挣扎。在本文中，我们提出了一种用于激光雷达语义分割领域泛化的表示学习方法，称为dglss++，该方法旨在确保在源域和未见域都具有鲁棒性，尽管只在源域进行训练。我们的方法侧重于从单一源域进行泛化，解决由激光雷达传感器配置和场景分布变化引起的域移位。为了处理稀疏到密集和密集到稀疏的泛化场景，我们通过生成稀疏和密集增强的域来模拟看不见的域。在增广域上，我们引入了广义掩模稀疏不变特征一致性（GMSIFC）和局部语义相关一致性（LSCC）两个约束条件。GMSIFC将源域的内部稀疏特征与增强域的内部稀疏特征在不同稀疏度下进行对齐，引入了一种新的掩蔽策略来排除与多个不一致类相关的体素特征。对于LSCC，来自空间局部区域的类原型被限制在所有局部区域之间保持相似的相关性，而不管场景或领域如何。此外，我们利用四个真实世界的数据集建立了标准化的训练和评估协议，并实施了几种基线方法。大量实验表明，我们的方法优于UDA和DG基线。代码可在https://github.com/gzgzys9887/DGLSS上获得。

{"title":"Towards Enhanced Representation Learning for Single-Source Domain Generalization in LiDAR Semantic Segmentation.","authors":"Hyeonseong Kim,Yoonsu Kang,Changgyoon Oh,Kuk-Jin Yoon","doi":"10.1109/tpami.2026.3654352","DOIUrl":"https://doi.org/10.1109/tpami.2026.3654352","url":null,"abstract":"With the success of the 3D deep learning models, various perception technologies for autonomous driving have been developed in the LiDAR domain. While these models perform well in the trained source domain, they struggle in unseen domains with a domain gap. In this paper, we propose a representation learning approach for domain generalization in LiDAR semantic segmentation, termed DGLSS++, which is designed to ensure robust performance in both the source domain and unseen domains despite training exclusively on the source domain. Our approach focuses on generalizing from a single source domain, addressing the domain shift caused by variations in LiDAR sensor configurations and scene distributions. To tackle both sparse-to-dense and dense-to-sparse generalization scenarios, we simulate unseen domains by generating sparsely and densely augmented domains. With the augmented domain, we introduce two constraints for generalizable representation learning: generalized masked sparsity invariant feature consistency (GMSIFC) and localized semantic correlation consistency (LSCC). GMSIFC aligns the internal sparse features of the source domain with those of the augmented domain at different sparsity, introducing a novel masking strategy to exclude voxel features associated with multiple inconsistent classes. For LSCC, class prototypes from spatially local regions are constrained to maintain similar correlations across all local regions, regardless of the scene or domain. In addition, we establish standardized training and evaluation protocols utilizing four real-world datasets and implement several baseline methods. Extensive experiments demonstrate our approach outperforms both UDA and DG baselines. The code is available at https://github.com/gzgzys9887/DGLSS.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"100 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BlindU: Blind Machine Unlearning without Revealing Erasing Data 盲机：不显示擦除数据的盲机学习

IF 23.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence

Pub Date : 2026-01-15 DOI: 10.1109/tpami.2026.3654093

Weiqi Wang, Zhiyi Tian, Chenhan Zhang, Shui Yu

引用次数: 0

Practical Continual Forgetting for Pre-trained Vision Models 预训练视觉模型的实际持续遗忘

IF 23.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence

Pub Date : 2026-01-15 DOI: 10.1109/tpami.2026.3654115

Hongbo Zhao, Fei Zhu, Bolin Ni, Feng Zhu, Gaofeng Meng, Zhaoxiang Zhang

引用次数: 0

DyDiT++: Diffusion Transformers with Timestep and Spatial Dynamics for Efficient Visual Generation. 具有时间步长和空间动态的扩散变压器，用于高效的视觉生成。

IF 23.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence

Pub Date : 2026-01-15 DOI: 10.1109/tpami.2026.3654201

Wangbo Zhao,Yizeng Han,Jiasheng Tang,Kai Wang,Hao Luo,Yibing Song,Gao Huang,Fan Wang,Yang You

Diffusion Transformer (DiT), an emerging diffusion model for visual generation, has demonstrated superior perfor mance but suffers from substantial computational costs. Our investigations reveal that these costs primarily stem from the static inference paradigm, which inevitably introduces redundant computation in certain diffusion timesteps and spatial regions. To overcome this inefficiency, we propose Dynamic Diffusion Transformer (DyDiT), an architecture that dynamically adjusts its computation along both timestep and spatial dimensions. Specifically, we introduce a Timestep-wise Dynamic Width (TDW) approach that adapts model width conditioned on the generation timesteps. In addition, we design a Spatial-wise Dynamic Token (SDT) strategy to avoid redundant computation at unnecessary spatial locations. TDW and SDT can be seamlessly integrated into DiT and significantly accelerate the generation process. Building on these designs, we present an extended version, DyDiT++, with improvements in three key aspects. First, it extends the generation mechanism of DyDiT beyond diffusion to flow matching, demon strating that our method can also accelerate flow-matching based generation, enhancing its versatility. Furthermore, we enhance DyDiT to tackle more complex visual generation tasks, including video generation and text-to-image generation, thereby broadening its real-world applications. Finally, to address the high cost of full fine-tuning and democratize technology access, we investigate the feasibility of training DyDiT in a parameter efficient manner and introduce timestep-based dynamic LoRA (TD-LoRA). Extensive experiments on diverse visual generation models, including DiT, SiT, Latte, and FLUX, demonstrate the effectiveness of DyDiT++. Remarkably, with <3% additional f ine-tuning iterations, our approach reduces the FLOPs of DiT XL by 51%, yielding 1.73× realistic speedup on hardware, and achieves a competitive FID score of 2.07 on ImageNet. The code is available at https://github.com/alibaba-damo-academy/DyDiT.

扩散变压器（Diffusion Transformer, DiT）是一种新兴的用于视觉生成的扩散模型，具有优异的性能，但其计算成本较高。我们的研究表明，这些成本主要源于静态推理范式，这不可避免地在某些扩散时间步长和空间区域引入了冗余计算。为了克服这种低效率，我们提出了动态扩散变压器（DyDiT），一种沿着时间步长和空间维度动态调整其计算的架构。具体来说，我们引入了一种时间步长动态宽度（TDW）方法，该方法根据生成的时间步长来适应模型宽度。此外，我们设计了一种空间智能动态令牌（SDT）策略，以避免在不必要的空间位置进行冗余计算。TDW和SDT可以无缝集成到DiT中，大大加快了生成过程。在这些设计的基础上，我们提出了一个扩展版本dyd++，并在三个关键方面进行了改进。首先，将DyDiT的生成机制从扩散扩展到流匹配，表明该方法还可以加速基于流匹配的生成，增强了其通用性。此外，我们增强了DyDiT来处理更复杂的视觉生成任务，包括视频生成和文本到图像的生成，从而扩展了它在现实世界中的应用。最后，为了解决全微调的高成本和技术获取的民主化，我们研究了以参数有效的方式训练DyDiT的可行性，并引入了基于时间步长的动态LoRA （TD-LoRA）。在DiT、SiT、Latte和FLUX等不同的视觉生成模型上进行的大量实验证明了dydit++的有效性。值得注意的是，通过<3%的额外微调迭代，我们的方法将DiT XL的FLOPs减少了51%，在硬件上产生了1.73倍的实际加速，并在ImageNet上实现了具有竞争力的FID得分2.07。代码可在https://github.com/alibaba-damo-academy/DyDiT上获得。

{"title":"DyDiT++: Diffusion Transformers with Timestep and Spatial Dynamics for Efficient Visual Generation.","authors":"Wangbo Zhao,Yizeng Han,Jiasheng Tang,Kai Wang,Hao Luo,Yibing Song,Gao Huang,Fan Wang,Yang You","doi":"10.1109/tpami.2026.3654201","DOIUrl":"https://doi.org/10.1109/tpami.2026.3654201","url":null,"abstract":"Diffusion Transformer (DiT), an emerging diffusion model for visual generation, has demonstrated superior perfor mance but suffers from substantial computational costs. Our investigations reveal that these costs primarily stem from the static inference paradigm, which inevitably introduces redundant computation in certain diffusion timesteps and spatial regions. To overcome this inefficiency, we propose Dynamic Diffusion Transformer (DyDiT), an architecture that dynamically adjusts its computation along both timestep and spatial dimensions. Specifically, we introduce a Timestep-wise Dynamic Width (TDW) approach that adapts model width conditioned on the generation timesteps. In addition, we design a Spatial-wise Dynamic Token (SDT) strategy to avoid redundant computation at unnecessary spatial locations. TDW and SDT can be seamlessly integrated into DiT and significantly accelerate the generation process. Building on these designs, we present an extended version, DyDiT++, with improvements in three key aspects. First, it extends the generation mechanism of DyDiT beyond diffusion to flow matching, demon strating that our method can also accelerate flow-matching based generation, enhancing its versatility. Furthermore, we enhance DyDiT to tackle more complex visual generation tasks, including video generation and text-to-image generation, thereby broadening its real-world applications. Finally, to address the high cost of full fine-tuning and democratize technology access, we investigate the feasibility of training DyDiT in a parameter efficient manner and introduce timestep-based dynamic LoRA (TD-LoRA). Extensive experiments on diverse visual generation models, including DiT, SiT, Latte, and FLUX, demonstrate the effectiveness of DyDiT++. Remarkably, with <3% additional f ine-tuning iterations, our approach reduces the FLOPs of DiT XL by 51%, yielding 1.73× realistic speedup on hardware, and achieves a competitive FID score of 2.07 on ImageNet. The code is available at https://github.com/alibaba-damo-academy/DyDiT.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"47 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Self-Supervised AI-Generated Image Detection: A Camera Metadata Perspective 自监督人工智能生成的图像检测：相机元数据视角

IF 23.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence

Pub Date : 2026-01-14 DOI: 10.1109/tpami.2026.3654274

Nan Zhong, Mian Zou, Yiran Xu, Zhenxing Qian, Xinpeng Zhang, Baoyuan Wu, Kede Ma

引用次数: 0

Reinforced Refinement with Self-Aware Expansion for End-to-End Autonomous Driving 基于自我意识扩展的端到端自动驾驶强化细化

IF 23.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence

Pub Date : 2026-01-14 DOI: 10.1109/tpami.2026.3653866

Haochen Liu, Tianyu Li, Haohan Yang, Li Chen, Caojun Wang, Ke Guo, Haochen Tian, Hongchen Li, Hongyang Li, Chen Lv

引用次数: 0

A Gift from the Integration of Discriminative and Diffusion-based Generative Learning: Boundary Refinement Remote Sensing Semantic Segmentation 基于判别和扩散的生成学习的融合：边界细化遥感语义分割

IF 23.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence

Pub Date : 2026-01-14 DOI: 10.1109/tpami.2026.3654243

Hao Wang, Keyan Hu, Xin Guo, Haifeng Li, Chao Tao

引用次数: 0

Consistency-Aware Spot-Guided Transformer for Accurate and Versatile Point Cloud Registration. 一致性感知点导向变压器精确和通用点云注册。

IF 23.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence

Pub Date : 2026-01-14 DOI: 10.1109/tpami.2026.3653989

Renlang Huang,Li Chai,Yufan Tang,Zhoujian Li,Jiming Chen,Liang Li

Deep learning-based feature matching has showcased great superiority for point cloud registration. While coarse-to-fine matching architectures are prevalent, they typically perform sparse and geometrically inconsistent coarse matching. This forces the subsequent fine matching to rely on computationally expensive optimal transport and hypothesis-and-selection procedures to resolve inconsistencies, leading to inefficiency and poor scalability for large-scale real-time applications. In this paper, we design a consistency-aware spot-guided Transformer (CAST) to enhance the coarse matching by explicitly utilizing geometric consistency via two key sparse attention mechanisms. First, our consistency-aware self-attention selectively computes intra-point-cloud attention to a sparse subset of points with globally consistent correspondences, enabling other points to derive discriminative features through their relationships with these anchors while propagating global consistency for robust correspondence reasoning. Second, our spot-guided cross-attention restricts cross-point-cloud attention to dynamically defined "spots"-the union of correspondence neighborhoods of a query's neighbors in the other point cloud, which are most likely to cover the true correspondence of the query ensured by local consistency, eliminating interference from similar but irrelevant regions. Furthermore, we design a lightweight local attention-based fine matching module to precisely predict dense correspondences and estimate the transformation. Extensive experiments on both outdoor LiDAR datasets and indoor RGB-D camera datasets demonstrate that our method achieves state-of-the-art accuracy, efficiency, and robustness. Besides, our method showcases superior generalization ability on our newly constructed challenging relocalization and loop closing benchmarks in unseen domains. Our code and models are available at https://github.com/RenlangHuang/CASTv2.

基于深度学习的特征匹配在点云配准方面显示出很大的优势。虽然粗到细的匹配架构很普遍，但它们通常执行稀疏且几何不一致的粗匹配。这迫使后续的精细匹配依赖于计算上昂贵的最优传输和假设-选择过程来解决不一致性，导致大规模实时应用的效率低下和可扩展性差。在本文中，我们设计了一个一致性感知的点引导变压器（CAST），通过两个关键的稀疏注意机制显式地利用几何一致性来增强粗匹配。首先，我们的一致性感知自我注意选择性地计算点云内对具有全局一致对应的点的稀疏子集的注意，使其他点能够通过它们与这些锚的关系获得判别特征，同时传播全局一致性以进行稳健的对应推理。其次，我们的点引导交叉注意将交叉点云的注意限制在动态定义的“点”上——一个查询的邻居在另一个点云中的对应邻域的联合，这些邻域最有可能覆盖由局部一致性保证的查询的真实对应，消除了来自相似但不相关区域的干扰。此外，我们设计了一个轻量级的基于局部关注的精细匹配模块，以精确预测密集对应并估计转换。在室外激光雷达数据集和室内RGB-D相机数据集上进行的大量实验表明，我们的方法达到了最先进的精度、效率和鲁棒性。此外，我们的方法在我们新构建的具有挑战性的重新定位和闭环基准上展示了卓越的泛化能力。我们的代码和模型可在https://github.com/RenlangHuang/CASTv2上获得。

{"title":"Consistency-Aware Spot-Guided Transformer for Accurate and Versatile Point Cloud Registration.","authors":"Renlang Huang,Li Chai,Yufan Tang,Zhoujian Li,Jiming Chen,Liang Li","doi":"10.1109/tpami.2026.3653989","DOIUrl":"https://doi.org/10.1109/tpami.2026.3653989","url":null,"abstract":"Deep learning-based feature matching has showcased great superiority for point cloud registration. While coarse-to-fine matching architectures are prevalent, they typically perform sparse and geometrically inconsistent coarse matching. This forces the subsequent fine matching to rely on computationally expensive optimal transport and hypothesis-and-selection procedures to resolve inconsistencies, leading to inefficiency and poor scalability for large-scale real-time applications. In this paper, we design a consistency-aware spot-guided Transformer (CAST) to enhance the coarse matching by explicitly utilizing geometric consistency via two key sparse attention mechanisms. First, our consistency-aware self-attention selectively computes intra-point-cloud attention to a sparse subset of points with globally consistent correspondences, enabling other points to derive discriminative features through their relationships with these anchors while propagating global consistency for robust correspondence reasoning. Second, our spot-guided cross-attention restricts cross-point-cloud attention to dynamically defined \"spots\"-the union of correspondence neighborhoods of a query's neighbors in the other point cloud, which are most likely to cover the true correspondence of the query ensured by local consistency, eliminating interference from similar but irrelevant regions. Furthermore, we design a lightweight local attention-based fine matching module to precisely predict dense correspondences and estimate the transformation. Extensive experiments on both outdoor LiDAR datasets and indoor RGB-D camera datasets demonstrate that our method achieves state-of-the-art accuracy, efficiency, and robustness. Besides, our method showcases superior generalization ability on our newly constructed challenging relocalization and loop closing benchmarks in unseen domains. Our code and models are available at https://github.com/RenlangHuang/CASTv2.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"50 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

IEEE Transactions on Pattern Analysis and Machine Intelligence

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀