首页 > 最新文献

IEEE Transactions on Pattern Analysis and Machine Intelligence最新文献

英文 中文
Out-of-Sight Embodied Agents: Multimodal Tracking, Sensor Fusion, and Trajectory Forecasting. 视线外具身代理:多模态跟踪、传感器融合和轨迹预测。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-23 DOI: 10.1109/tpami.2026.3676710
Haichao Zhang,Yi Xu,Yun Fu
Trajectory prediction is a fundamental problem in computer vision, vision-language-action models, world models, and autonomous systems, with broad impact on applications including autonomous driving, robotics, and surveillance. Most existing approaches assume observations are complete and relatively clean, and thus do not adequately address out-ofsight agents or the intrinsic noise in sensing modalities (e.g., sensor measurements) caused by restricted camera coverage, occlusions, and the lack of ground-truth denoised trajectories. These factors introduce substantial safety concerns and reduce the robustness of trajectory prediction in practical deployments. In this extended study, we introduce major improvements to Out-of-Sight Trajectory (OST), a new task aimed at predicting noise-free visual trajectories of out-of-sight objects from noisy sensor observations. Based on our prior work, we expand the setting of Out-of-Sight Trajectory Prediction (OOSTraj) from pedestrians to both pedestrians and vehicles, thereby increasing its relevance to autonomous driving, robotics, and surveillance scenarios. Our improved Vision-Positioning Denoising Module utilizes camera calibration to construct a vision-position correspondence, mitigating the absence of direct visual cues while enabling effective unsupervised denoising of noisy sensor signals. Extensive experiments on the Vi-Fi and JRDB datasets demonstrate that our method achieves state-of-the-art results for both trajectory denoising and trajectory prediction, with clear gains over prior baselines. We further provide comparisons against classical denoising techniques, including Kalman filtering, and adapt recent trajectory prediction models to this setting, establishing a stronger and more comprehensive benchmark. To the best of our knowledge, this is the first work to incorporate vision-positioning projection to denoise noisy sensor trajectories of out-of-sight agents, opening new directions for future research in this area. The code and preprocessed datasets are available at https://github.com/Hai-chao-Zhang/OST.
轨迹预测是计算机视觉、视觉语言-动作模型、世界模型和自主系统中的一个基本问题,对自动驾驶、机器人和监视等应用具有广泛的影响。大多数现有方法假设观测是完整和相对干净的,因此不能充分解决视线外的因素或感知模式(例如传感器测量)中的固有噪声,这些噪声是由受限的相机覆盖范围、遮挡和缺乏地真去噪轨迹引起的。这些因素带来了大量的安全问题,并降低了实际部署中轨迹预测的鲁棒性。在这项扩展研究中,我们介绍了对视线外轨迹(OST)的重大改进,这是一项旨在从噪声传感器观测中预测视线外物体的无噪声视觉轨迹的新任务。在我们之前工作的基础上,我们将视线外轨迹预测(OOSTraj)的设置从行人扩展到行人和车辆,从而增加其与自动驾驶、机器人和监控场景的相关性。我们改进的视觉定位去噪模块利用相机校准来构建视觉位置对应关系,减轻了直接视觉线索的缺乏,同时实现了对噪声传感器信号的有效无监督去噪。在Vi-Fi和JRDB数据集上进行的大量实验表明,我们的方法在轨迹去噪和轨迹预测方面都取得了最先进的结果,比之前的基线有明显的提高。我们进一步提供了与经典去噪技术(包括卡尔曼滤波)的比较,并使最新的轨迹预测模型适应这种设置,建立了更强大、更全面的基准。据我们所知,这是第一次将视觉定位投影结合到视线外代理的噪声传感器轨迹中去噪,为该领域的未来研究开辟了新的方向。代码和预处理数据集可在https://github.com/Hai-chao-Zhang/OST上获得。
{"title":"Out-of-Sight Embodied Agents: Multimodal Tracking, Sensor Fusion, and Trajectory Forecasting.","authors":"Haichao Zhang,Yi Xu,Yun Fu","doi":"10.1109/tpami.2026.3676710","DOIUrl":"https://doi.org/10.1109/tpami.2026.3676710","url":null,"abstract":"Trajectory prediction is a fundamental problem in computer vision, vision-language-action models, world models, and autonomous systems, with broad impact on applications including autonomous driving, robotics, and surveillance. Most existing approaches assume observations are complete and relatively clean, and thus do not adequately address out-ofsight agents or the intrinsic noise in sensing modalities (e.g., sensor measurements) caused by restricted camera coverage, occlusions, and the lack of ground-truth denoised trajectories. These factors introduce substantial safety concerns and reduce the robustness of trajectory prediction in practical deployments. In this extended study, we introduce major improvements to Out-of-Sight Trajectory (OST), a new task aimed at predicting noise-free visual trajectories of out-of-sight objects from noisy sensor observations. Based on our prior work, we expand the setting of Out-of-Sight Trajectory Prediction (OOSTraj) from pedestrians to both pedestrians and vehicles, thereby increasing its relevance to autonomous driving, robotics, and surveillance scenarios. Our improved Vision-Positioning Denoising Module utilizes camera calibration to construct a vision-position correspondence, mitigating the absence of direct visual cues while enabling effective unsupervised denoising of noisy sensor signals. Extensive experiments on the Vi-Fi and JRDB datasets demonstrate that our method achieves state-of-the-art results for both trajectory denoising and trajectory prediction, with clear gains over prior baselines. We further provide comparisons against classical denoising techniques, including Kalman filtering, and adapt recent trajectory prediction models to this setting, establishing a stronger and more comprehensive benchmark. To the best of our knowledge, this is the first work to incorporate vision-positioning projection to denoise noisy sensor trajectories of out-of-sight agents, opening new directions for future research in this area. The code and preprocessed datasets are available at https://github.com/Hai-chao-Zhang/OST.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"80 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147502307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
One-Step Diffusion and Flow Distillation through Implicit Generator Matching. 基于隐式发生器匹配的一步扩散和流动蒸馏。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-23 DOI: 10.1109/tpami.2026.3676894
Zemin Huang,Weijian Luo,Zhengyang Geng,Guojun Qi
Despite strong performances on many generative tasks, diffusion and flow matching models require a large number of sampling steps to generate high-quality images. This has motivated the community to develop effective methods to distill pre-trained models into more efficient models. In this paper, we present Implicit Generator Matching (IGM), a systematic approach to distill both pre-trained diffusion/flow matching models into one-step generator models, while maintaining almost the same sample generation ability as the original model, as well as being data-free with no need for training images. The key challenge is that the traditional diffusion/flow-matching loss is intractable to distill a teacher diffusion/flow model with an explicitly defined field into a student generator, whose field is defined implicitly. The main breakthrough, our Implicit Gradient Theorem, provides an exact and efficient gradient to directly optimize the student by aligning this implicit field with the teacher's. IGM shows strong empirical performance for one-step generators, setting new standards. On CIFAR10, our diffusion-based SIM achieves an FID score of 2.06, while flow-based FGM sets a flow-model record with a 3.08 FID. Scaling to text-to-image models, SIM distillation of PixArt-$alpha$ yields a leading 6.42 aesthetic score, surpassing SDXL-TURBO (5.33), and FGM distillation of SD3 achieves a competitive 0.65 GenEval score against multi-step accelerators like Hyper-SD3 (0.63).
尽管在许多生成任务上表现优异,但扩散和流匹配模型需要大量的采样步骤才能生成高质量的图像。这促使社区开发有效的方法,将预训练的模型提炼成更有效的模型。在本文中,我们提出了隐式生成器匹配(IGM),这是一种将预训练的扩散/流动匹配模型提取到一步生成器模型的系统方法,同时保持与原始模型几乎相同的样本生成能力,并且不需要训练图像而无需数据。传统的扩散/流匹配损失难以将具有显式定义字段的教师扩散/流模型提取到具有隐式定义字段的学生生成器中。我们的主要突破是隐式梯度定理,它提供了一个精确而有效的梯度,通过将这个隐式场与教师的场对齐来直接优化学生。IGM在一步发电机上表现出较强的经验性能,树立了新的标准。在CIFAR10上,我们基于扩散的SIM获得了2.06的FID分数,而基于流动的FGM以3.08的FID创造了流动模型记录。扩展到文本到图像模型,PixArt-$alpha$的SIM蒸馏产生了领先的6.42美学分数,超过了SDXL-TURBO(5.33),而SD3的FGM蒸馏与多步骤加速器如super -SD3(0.63)相比,达到了具有竞争力的0.65 GenEval分数。
{"title":"One-Step Diffusion and Flow Distillation through Implicit Generator Matching.","authors":"Zemin Huang,Weijian Luo,Zhengyang Geng,Guojun Qi","doi":"10.1109/tpami.2026.3676894","DOIUrl":"https://doi.org/10.1109/tpami.2026.3676894","url":null,"abstract":"Despite strong performances on many generative tasks, diffusion and flow matching models require a large number of sampling steps to generate high-quality images. This has motivated the community to develop effective methods to distill pre-trained models into more efficient models. In this paper, we present Implicit Generator Matching (IGM), a systematic approach to distill both pre-trained diffusion/flow matching models into one-step generator models, while maintaining almost the same sample generation ability as the original model, as well as being data-free with no need for training images. The key challenge is that the traditional diffusion/flow-matching loss is intractable to distill a teacher diffusion/flow model with an explicitly defined field into a student generator, whose field is defined implicitly. The main breakthrough, our Implicit Gradient Theorem, provides an exact and efficient gradient to directly optimize the student by aligning this implicit field with the teacher's. IGM shows strong empirical performance for one-step generators, setting new standards. On CIFAR10, our diffusion-based SIM achieves an FID score of 2.06, while flow-based FGM sets a flow-model record with a 3.08 FID. Scaling to text-to-image models, SIM distillation of PixArt-$alpha$ yields a leading 6.42 aesthetic score, surpassing SDXL-TURBO (5.33), and FGM distillation of SD3 achieves a competitive 0.65 GenEval score against multi-step accelerators like Hyper-SD3 (0.63).","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"16 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147502306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast and Scalable Hashing-Based Universal Graph Coarsening. 基于哈希的快速可扩展通用图粗化。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-23 DOI: 10.1109/tpami.2026.3676633
Mohit Kataria,Nikita Malik, Jayadeva,Sandeep Kumar
Large graphs are becoming ubiquitous, presenting significant computational hurdles in data processing and analysis. Graph Coarsening algorithms are frequently employed to condense large graphs while preserving key graph properties. Real-world graphs also have features or contexts associated with each node. However, existing coarsening methods often overlook simultaneity across node features and structural information. Recent approaches to alleviate this limitation are computationally intensive, and primarily suited for homophilic datasets. Most existing approaches are unsuitable for streaming and evolving graphs, as they require recomputation of the coarsened graph at every timestamp. In this paper, we introduce a Fast and Scalable Hashing-Based Universal Graph Coarsening (UGC) Framework, that integrates locality-sensitive hashing, and feature augmentation to effectively coarsen graphs. UGC is exceptionally fast, straightforward to implement, and capable of handling homophilic, heterophilic, and streaming graphs making it a truly universal solution for graph coarsening. We use an optimization-based framework to minimize a constrained $epsilon$ similarity between the original and coarsened graphs, where $epsilon$ is between zero and one. Through extensive experimentation on real and synthetic datasets, we demonstrate the effectiveness of our approach in terms of improved runtime complexity and generalization to heterophilic and streaming graphs. Furthermore, we showcase its utility in downstream tasks, emphasizing its scalability for training graph neural networks on coarsened graphs from benchmark real-world datasets.
大图形正变得无处不在,在数据处理和分析中呈现出重大的计算障碍。图粗化算法经常用于压缩大型图,同时保留关键的图属性。现实世界的图也有与每个节点相关联的特征或上下文。然而,现有的粗化方法往往忽略了节点特征和结构信息的同时性。最近缓解这一限制的方法是计算密集型的,并且主要适用于同族数据集。大多数现有的方法都不适合流化和进化图,因为它们需要在每个时间戳重新计算粗化图。在本文中,我们引入了一个快速和可扩展的基于哈希的通用图粗化(UGC)框架,该框架集成了位置敏感哈希和特征增强来有效地粗化图。UGC非常快速,易于实现,并且能够处理同性,异性恋和流图,使其成为图形粗化的真正通用解决方案。我们使用基于优化的框架来最小化原始图和粗图之间的约束$epsilon$相似度,其中$epsilon$在0到1之间。通过在真实和合成数据集上的广泛实验,我们证明了我们的方法在提高运行时复杂性和对异性恋和流图的泛化方面的有效性。此外,我们展示了它在下游任务中的实用性,强调了它在来自基准真实世界数据集的粗图上训练图神经网络的可扩展性。
{"title":"Fast and Scalable Hashing-Based Universal Graph Coarsening.","authors":"Mohit Kataria,Nikita Malik, Jayadeva,Sandeep Kumar","doi":"10.1109/tpami.2026.3676633","DOIUrl":"https://doi.org/10.1109/tpami.2026.3676633","url":null,"abstract":"Large graphs are becoming ubiquitous, presenting significant computational hurdles in data processing and analysis. Graph Coarsening algorithms are frequently employed to condense large graphs while preserving key graph properties. Real-world graphs also have features or contexts associated with each node. However, existing coarsening methods often overlook simultaneity across node features and structural information. Recent approaches to alleviate this limitation are computationally intensive, and primarily suited for homophilic datasets. Most existing approaches are unsuitable for streaming and evolving graphs, as they require recomputation of the coarsened graph at every timestamp. In this paper, we introduce a Fast and Scalable Hashing-Based Universal Graph Coarsening (UGC) Framework, that integrates locality-sensitive hashing, and feature augmentation to effectively coarsen graphs. UGC is exceptionally fast, straightforward to implement, and capable of handling homophilic, heterophilic, and streaming graphs making it a truly universal solution for graph coarsening. We use an optimization-based framework to minimize a constrained $epsilon$ similarity between the original and coarsened graphs, where $epsilon$ is between zero and one. Through extensive experimentation on real and synthetic datasets, we demonstrate the effectiveness of our approach in terms of improved runtime complexity and generalization to heterophilic and streaming graphs. Furthermore, we showcase its utility in downstream tasks, emphasizing its scalability for training graph neural networks on coarsened graphs from benchmark real-world datasets.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"49 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147502305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph Condensation via Homophily Node Refining and Fine-Grained Distribution Matching. 基于同态节点精炼和细粒度分布匹配的图凝聚。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-20 DOI: 10.1109/tpami.2026.3672916
Ruiwen Yuan,Yongqiang Tang,Wensheng Zhang
The remarkable success of GNNs has provoked the challenge of high computational and memory overhead when training with large-scale graphs. As a promising solution, graph condensation is committed to constructing synthetic graphs with significantly smaller size, which are expected to preserve the essential characteristics of the original ones. During this process, a core problem is how to accurately portray and align the data distribution structures between the original graph space and the synthetic graph space. A mainstream idea in existing research is matching the class distributions between the two spaces. Unfortunately, they generally overlook two key issues: 1) heterophilic nodes in original graphs may render the chaotic class distribution patterns; 2) coarse-grained matching of the overall class centroid between original and synthetic spaces is insufficient for data with complex subcategory distributions. In this paper, we propose a novel Graph Condensation method via homophily node Refinement and fine-grained class Distribution matching (GCRD). Given the original large-scale graph, we first distinguish the nodes into advantageous homophilic nodes and detrimental heterophilic nodes, followed by adaptively assigning node weights to refine the generated class distribution patterns of the original graphs. Furthermore, with the refined class distribution patterns, we propose a fine-grained distribution matching objective to more delicately align the local distribution structure of subclasses within each class. The rigorous theoretical analysis confirms the effectiveness of our proposal in precisely learning the class information. Extensive experiments demonstrate our state-of-the-art classification and cross-architecture generalization performance against various baselines.
gnn的显著成功引发了大规模图训练时高计算和内存开销的挑战。图凝聚是一种很有前途的解决方案,它致力于构建具有更小尺寸的合成图,并希望保留原始图的基本特征。在此过程中,一个核心问题是如何准确地描绘和对齐原始图空间和合成图空间之间的数据分布结构。在现有的研究中,一个主流的想法是匹配两个空间之间的阶级分布。不幸的是,他们通常忽略了两个关键问题:1)原始图中的异亲节点可能导致混乱的类分布模式;2)对于子类分布复杂的数据,原始空间与合成空间整体类质心的粗粒度匹配不足。本文提出了一种基于同态节点细化和细粒度类分布匹配(GCRD)的图凝聚方法。在给定原始大规模图的情况下,首先将节点区分为有利的同亲节点和不利的异亲节点,然后自适应地分配节点权值,以改进生成的原始图的类分布模式。此外,在细化类分布模式的基础上,我们提出了一个细粒度分布匹配目标,以更精细地对齐每个类内子类的局部分布结构。严谨的理论分析证实了我们的建议在准确学习课堂信息方面的有效性。大量的实验证明了我们最先进的分类和跨架构泛化性能针对各种基线。
{"title":"Graph Condensation via Homophily Node Refining and Fine-Grained Distribution Matching.","authors":"Ruiwen Yuan,Yongqiang Tang,Wensheng Zhang","doi":"10.1109/tpami.2026.3672916","DOIUrl":"https://doi.org/10.1109/tpami.2026.3672916","url":null,"abstract":"The remarkable success of GNNs has provoked the challenge of high computational and memory overhead when training with large-scale graphs. As a promising solution, graph condensation is committed to constructing synthetic graphs with significantly smaller size, which are expected to preserve the essential characteristics of the original ones. During this process, a core problem is how to accurately portray and align the data distribution structures between the original graph space and the synthetic graph space. A mainstream idea in existing research is matching the class distributions between the two spaces. Unfortunately, they generally overlook two key issues: 1) heterophilic nodes in original graphs may render the chaotic class distribution patterns; 2) coarse-grained matching of the overall class centroid between original and synthetic spaces is insufficient for data with complex subcategory distributions. In this paper, we propose a novel Graph Condensation method via homophily node Refinement and fine-grained class Distribution matching (GCRD). Given the original large-scale graph, we first distinguish the nodes into advantageous homophilic nodes and detrimental heterophilic nodes, followed by adaptively assigning node weights to refine the generated class distribution patterns of the original graphs. Furthermore, with the refined class distribution patterns, we propose a fine-grained distribution matching objective to more delicately align the local distribution structure of subclasses within each class. The rigorous theoretical analysis confirms the effectiveness of our proposal in precisely learning the class information. Extensive experiments demonstrate our state-of-the-art classification and cross-architecture generalization performance against various baselines.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"13 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147490144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distilling Object Detectors via Monte Carlo Dropout. 蒸馏对象探测器通过蒙特卡罗Dropout。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-18 DOI: 10.1109/tpami.2026.3674980
Junfei Yi,Hui Zhang,Jianxu Mao,Tengfei Liu,Mingjie Li,Sihao Lin,Hanyu Gu,Zhihui Li,Xiaojun Chang,Yaonan Wang
Knowledge distillation (KD) has become a fundamental technique for model compression in object detection tasks. The data noise and training randomness may cause the knowledge of the teacher model to be unreliable, referred to as knowledge uncertainty. Existing methods neglect this uncertainty, potentially hindering the student's capacity to capture and understand latent "dark knowledge". In this work, we introduce a novel strategy that explicitly incorporates knowledge uncertainty, named Uncertainty-Driven Knowledge Extraction and Transfer (UET). Given the unknown, high-dimensional nature of the knowledge distribution, we employ Monte Carlo dropout to effectively estimate the teacher's uncertainty. Leveraging information theory, we combine uncertainty with deterministic knowledge, enabling the student to benefit from both precision and diversity. UET is a plug-and-play method that integrates seamlessly with existing distillation techniques. We validate our approach through comprehensive experiments across various distillation strategies, detectors, and backbones. Specifically, UET achieves state-of-the-art results, with a ResNet50-based GFL detector obtaining 44.1% mAP on the COCO dataset-surpassing baseline performance by 3.9%.
知识蒸馏(Knowledge distillation, KD)已成为目标检测任务中模型压缩的一项基本技术。数据噪声和训练随机性可能导致教师模型的知识不可靠,称为知识不确定性。现有的方法忽略了这种不确定性,潜在地阻碍了学生捕捉和理解潜在的“黑暗知识”的能力。在这项工作中,我们引入了一种明确包含知识不确定性的新策略,称为不确定性驱动的知识提取和转移(UET)。考虑到知识分布的未知、高维性质,我们使用蒙特卡罗dropout来有效地估计教师的不确定性。利用信息论,我们将不确定性与确定性知识结合起来,使学生从准确性和多样性中受益。UET是一种即插即用的方法,与现有的蒸馏技术无缝集成。我们通过各种蒸馏策略、检测器和主干的综合实验验证了我们的方法。具体来说,UET达到了最先进的结果,基于resnet50的GFL检测器在COCO数据集上获得44.1%的mAP,比基线性能高出3.9%。
{"title":"Distilling Object Detectors via Monte Carlo Dropout.","authors":"Junfei Yi,Hui Zhang,Jianxu Mao,Tengfei Liu,Mingjie Li,Sihao Lin,Hanyu Gu,Zhihui Li,Xiaojun Chang,Yaonan Wang","doi":"10.1109/tpami.2026.3674980","DOIUrl":"https://doi.org/10.1109/tpami.2026.3674980","url":null,"abstract":"Knowledge distillation (KD) has become a fundamental technique for model compression in object detection tasks. The data noise and training randomness may cause the knowledge of the teacher model to be unreliable, referred to as knowledge uncertainty. Existing methods neglect this uncertainty, potentially hindering the student's capacity to capture and understand latent \"dark knowledge\". In this work, we introduce a novel strategy that explicitly incorporates knowledge uncertainty, named Uncertainty-Driven Knowledge Extraction and Transfer (UET). Given the unknown, high-dimensional nature of the knowledge distribution, we employ Monte Carlo dropout to effectively estimate the teacher's uncertainty. Leveraging information theory, we combine uncertainty with deterministic knowledge, enabling the student to benefit from both precision and diversity. UET is a plug-and-play method that integrates seamlessly with existing distillation techniques. We validate our approach through comprehensive experiments across various distillation strategies, detectors, and backbones. Specifically, UET achieves state-of-the-art results, with a ResNet50-based GFL detector obtaining 44.1% mAP on the COCO dataset-surpassing baseline performance by 3.9%.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"11 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147478968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Collaborative Multi-Modal Coding for High-Quality 3D Generation 高质量3D生成的协同多模态编码
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-17 DOI: 10.1109/tpami.2026.3674943
Ziang Cao, Zhaoxi Chen, Liang Pan, Ziwei Liu
{"title":"Collaborative Multi-Modal Coding for High-Quality 3D Generation","authors":"Ziang Cao, Zhaoxi Chen, Liang Pan, Ziwei Liu","doi":"10.1109/tpami.2026.3674943","DOIUrl":"https://doi.org/10.1109/tpami.2026.3674943","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"84 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147471004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved and Accelerated Text-to-Image Generation with Collect, Reflect, and Refine 改进和加速文本到图像的生成与收集,反映,和细化
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-17 DOI: 10.1109/tpami.2026.3674984
Shitong Shao, Zikai Zhou, Dian Xie, Yuetong Fang, Tian Ye, Lichen Bai, Bo Han, Zeke Xie
{"title":"Improved and Accelerated Text-to-Image Generation with Collect, Reflect, and Refine","authors":"Shitong Shao, Zikai Zhou, Dian Xie, Yuetong Fang, Tian Ye, Lichen Bai, Bo Han, Zeke Xie","doi":"10.1109/tpami.2026.3674984","DOIUrl":"https://doi.org/10.1109/tpami.2026.3674984","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"79 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147471013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging Datasets and Hyperparameters: GCN-Based Link Prediction for Recommendation 桥接数据集和超参数:基于gcn的推荐链接预测
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-17 DOI: 10.1109/tpami.2026.3675022
Liping Deng, MingQing Xiao
{"title":"Bridging Datasets and Hyperparameters: GCN-Based Link Prediction for Recommendation","authors":"Liping Deng, MingQing Xiao","doi":"10.1109/tpami.2026.3675022","DOIUrl":"https://doi.org/10.1109/tpami.2026.3675022","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"52 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147471014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beyond Support Samples: Incorporating Unlabeled Queries for Few-Shot Semantic Segmentation 超越支持样本:结合无标记查询的少数镜头语义分割
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-17 DOI: 10.1109/tpami.2026.3674742
Yuanwei Liu, Nian Liu, Tao Jiang, Yi Wu, Xiwen Yao, Junwei Han
{"title":"Beyond Support Samples: Incorporating Unlabeled Queries for Few-Shot Semantic Segmentation","authors":"Yuanwei Liu, Nian Liu, Tao Jiang, Yi Wu, Xiwen Yao, Junwei Han","doi":"10.1109/tpami.2026.3674742","DOIUrl":"https://doi.org/10.1109/tpami.2026.3674742","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"31 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147471015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mirror Descent Safe Policy Optimization for Reinforcement Learning Agents 强化学习智能体的镜像下降安全策略优化
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-17 DOI: 10.1109/tpami.2026.3674995
Renzhi Lu, Ning Wu, Qingqing Xiong, Yifang Shi, Dongrui Wu, Tao Yang, Yaochu Jin, Lihua Xie
{"title":"Mirror Descent Safe Policy Optimization for Reinforcement Learning Agents","authors":"Renzhi Lu, Ning Wu, Qingqing Xiong, Yifang Shi, Dongrui Wu, Tao Yang, Yaochu Jin, Lihua Xie","doi":"10.1109/tpami.2026.3674995","DOIUrl":"https://doi.org/10.1109/tpami.2026.3674995","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"146 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147471008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Pattern Analysis and Machine Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1