Pub Date : 2026-03-23DOI: 10.1109/tpami.2026.3676710
Haichao Zhang,Yi Xu,Yun Fu
Trajectory prediction is a fundamental problem in computer vision, vision-language-action models, world models, and autonomous systems, with broad impact on applications including autonomous driving, robotics, and surveillance. Most existing approaches assume observations are complete and relatively clean, and thus do not adequately address out-ofsight agents or the intrinsic noise in sensing modalities (e.g., sensor measurements) caused by restricted camera coverage, occlusions, and the lack of ground-truth denoised trajectories. These factors introduce substantial safety concerns and reduce the robustness of trajectory prediction in practical deployments. In this extended study, we introduce major improvements to Out-of-Sight Trajectory (OST), a new task aimed at predicting noise-free visual trajectories of out-of-sight objects from noisy sensor observations. Based on our prior work, we expand the setting of Out-of-Sight Trajectory Prediction (OOSTraj) from pedestrians to both pedestrians and vehicles, thereby increasing its relevance to autonomous driving, robotics, and surveillance scenarios. Our improved Vision-Positioning Denoising Module utilizes camera calibration to construct a vision-position correspondence, mitigating the absence of direct visual cues while enabling effective unsupervised denoising of noisy sensor signals. Extensive experiments on the Vi-Fi and JRDB datasets demonstrate that our method achieves state-of-the-art results for both trajectory denoising and trajectory prediction, with clear gains over prior baselines. We further provide comparisons against classical denoising techniques, including Kalman filtering, and adapt recent trajectory prediction models to this setting, establishing a stronger and more comprehensive benchmark. To the best of our knowledge, this is the first work to incorporate vision-positioning projection to denoise noisy sensor trajectories of out-of-sight agents, opening new directions for future research in this area. The code and preprocessed datasets are available at https://github.com/Hai-chao-Zhang/OST.
{"title":"Out-of-Sight Embodied Agents: Multimodal Tracking, Sensor Fusion, and Trajectory Forecasting.","authors":"Haichao Zhang,Yi Xu,Yun Fu","doi":"10.1109/tpami.2026.3676710","DOIUrl":"https://doi.org/10.1109/tpami.2026.3676710","url":null,"abstract":"Trajectory prediction is a fundamental problem in computer vision, vision-language-action models, world models, and autonomous systems, with broad impact on applications including autonomous driving, robotics, and surveillance. Most existing approaches assume observations are complete and relatively clean, and thus do not adequately address out-ofsight agents or the intrinsic noise in sensing modalities (e.g., sensor measurements) caused by restricted camera coverage, occlusions, and the lack of ground-truth denoised trajectories. These factors introduce substantial safety concerns and reduce the robustness of trajectory prediction in practical deployments. In this extended study, we introduce major improvements to Out-of-Sight Trajectory (OST), a new task aimed at predicting noise-free visual trajectories of out-of-sight objects from noisy sensor observations. Based on our prior work, we expand the setting of Out-of-Sight Trajectory Prediction (OOSTraj) from pedestrians to both pedestrians and vehicles, thereby increasing its relevance to autonomous driving, robotics, and surveillance scenarios. Our improved Vision-Positioning Denoising Module utilizes camera calibration to construct a vision-position correspondence, mitigating the absence of direct visual cues while enabling effective unsupervised denoising of noisy sensor signals. Extensive experiments on the Vi-Fi and JRDB datasets demonstrate that our method achieves state-of-the-art results for both trajectory denoising and trajectory prediction, with clear gains over prior baselines. We further provide comparisons against classical denoising techniques, including Kalman filtering, and adapt recent trajectory prediction models to this setting, establishing a stronger and more comprehensive benchmark. To the best of our knowledge, this is the first work to incorporate vision-positioning projection to denoise noisy sensor trajectories of out-of-sight agents, opening new directions for future research in this area. The code and preprocessed datasets are available at https://github.com/Hai-chao-Zhang/OST.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"80 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147502307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-23DOI: 10.1109/tpami.2026.3676894
Zemin Huang,Weijian Luo,Zhengyang Geng,Guojun Qi
Despite strong performances on many generative tasks, diffusion and flow matching models require a large number of sampling steps to generate high-quality images. This has motivated the community to develop effective methods to distill pre-trained models into more efficient models. In this paper, we present Implicit Generator Matching (IGM), a systematic approach to distill both pre-trained diffusion/flow matching models into one-step generator models, while maintaining almost the same sample generation ability as the original model, as well as being data-free with no need for training images. The key challenge is that the traditional diffusion/flow-matching loss is intractable to distill a teacher diffusion/flow model with an explicitly defined field into a student generator, whose field is defined implicitly. The main breakthrough, our Implicit Gradient Theorem, provides an exact and efficient gradient to directly optimize the student by aligning this implicit field with the teacher's. IGM shows strong empirical performance for one-step generators, setting new standards. On CIFAR10, our diffusion-based SIM achieves an FID score of 2.06, while flow-based FGM sets a flow-model record with a 3.08 FID. Scaling to text-to-image models, SIM distillation of PixArt-$alpha$ yields a leading 6.42 aesthetic score, surpassing SDXL-TURBO (5.33), and FGM distillation of SD3 achieves a competitive 0.65 GenEval score against multi-step accelerators like Hyper-SD3 (0.63).
{"title":"One-Step Diffusion and Flow Distillation through Implicit Generator Matching.","authors":"Zemin Huang,Weijian Luo,Zhengyang Geng,Guojun Qi","doi":"10.1109/tpami.2026.3676894","DOIUrl":"https://doi.org/10.1109/tpami.2026.3676894","url":null,"abstract":"Despite strong performances on many generative tasks, diffusion and flow matching models require a large number of sampling steps to generate high-quality images. This has motivated the community to develop effective methods to distill pre-trained models into more efficient models. In this paper, we present Implicit Generator Matching (IGM), a systematic approach to distill both pre-trained diffusion/flow matching models into one-step generator models, while maintaining almost the same sample generation ability as the original model, as well as being data-free with no need for training images. The key challenge is that the traditional diffusion/flow-matching loss is intractable to distill a teacher diffusion/flow model with an explicitly defined field into a student generator, whose field is defined implicitly. The main breakthrough, our Implicit Gradient Theorem, provides an exact and efficient gradient to directly optimize the student by aligning this implicit field with the teacher's. IGM shows strong empirical performance for one-step generators, setting new standards. On CIFAR10, our diffusion-based SIM achieves an FID score of 2.06, while flow-based FGM sets a flow-model record with a 3.08 FID. Scaling to text-to-image models, SIM distillation of PixArt-$alpha$ yields a leading 6.42 aesthetic score, surpassing SDXL-TURBO (5.33), and FGM distillation of SD3 achieves a competitive 0.65 GenEval score against multi-step accelerators like Hyper-SD3 (0.63).","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"16 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147502306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large graphs are becoming ubiquitous, presenting significant computational hurdles in data processing and analysis. Graph Coarsening algorithms are frequently employed to condense large graphs while preserving key graph properties. Real-world graphs also have features or contexts associated with each node. However, existing coarsening methods often overlook simultaneity across node features and structural information. Recent approaches to alleviate this limitation are computationally intensive, and primarily suited for homophilic datasets. Most existing approaches are unsuitable for streaming and evolving graphs, as they require recomputation of the coarsened graph at every timestamp. In this paper, we introduce a Fast and Scalable Hashing-Based Universal Graph Coarsening (UGC) Framework, that integrates locality-sensitive hashing, and feature augmentation to effectively coarsen graphs. UGC is exceptionally fast, straightforward to implement, and capable of handling homophilic, heterophilic, and streaming graphs making it a truly universal solution for graph coarsening. We use an optimization-based framework to minimize a constrained $epsilon$ similarity between the original and coarsened graphs, where $epsilon$ is between zero and one. Through extensive experimentation on real and synthetic datasets, we demonstrate the effectiveness of our approach in terms of improved runtime complexity and generalization to heterophilic and streaming graphs. Furthermore, we showcase its utility in downstream tasks, emphasizing its scalability for training graph neural networks on coarsened graphs from benchmark real-world datasets.
{"title":"Fast and Scalable Hashing-Based Universal Graph Coarsening.","authors":"Mohit Kataria,Nikita Malik, Jayadeva,Sandeep Kumar","doi":"10.1109/tpami.2026.3676633","DOIUrl":"https://doi.org/10.1109/tpami.2026.3676633","url":null,"abstract":"Large graphs are becoming ubiquitous, presenting significant computational hurdles in data processing and analysis. Graph Coarsening algorithms are frequently employed to condense large graphs while preserving key graph properties. Real-world graphs also have features or contexts associated with each node. However, existing coarsening methods often overlook simultaneity across node features and structural information. Recent approaches to alleviate this limitation are computationally intensive, and primarily suited for homophilic datasets. Most existing approaches are unsuitable for streaming and evolving graphs, as they require recomputation of the coarsened graph at every timestamp. In this paper, we introduce a Fast and Scalable Hashing-Based Universal Graph Coarsening (UGC) Framework, that integrates locality-sensitive hashing, and feature augmentation to effectively coarsen graphs. UGC is exceptionally fast, straightforward to implement, and capable of handling homophilic, heterophilic, and streaming graphs making it a truly universal solution for graph coarsening. We use an optimization-based framework to minimize a constrained $epsilon$ similarity between the original and coarsened graphs, where $epsilon$ is between zero and one. Through extensive experimentation on real and synthetic datasets, we demonstrate the effectiveness of our approach in terms of improved runtime complexity and generalization to heterophilic and streaming graphs. Furthermore, we showcase its utility in downstream tasks, emphasizing its scalability for training graph neural networks on coarsened graphs from benchmark real-world datasets.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"49 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147502305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-20DOI: 10.1109/tpami.2026.3672916
Ruiwen Yuan,Yongqiang Tang,Wensheng Zhang
The remarkable success of GNNs has provoked the challenge of high computational and memory overhead when training with large-scale graphs. As a promising solution, graph condensation is committed to constructing synthetic graphs with significantly smaller size, which are expected to preserve the essential characteristics of the original ones. During this process, a core problem is how to accurately portray and align the data distribution structures between the original graph space and the synthetic graph space. A mainstream idea in existing research is matching the class distributions between the two spaces. Unfortunately, they generally overlook two key issues: 1) heterophilic nodes in original graphs may render the chaotic class distribution patterns; 2) coarse-grained matching of the overall class centroid between original and synthetic spaces is insufficient for data with complex subcategory distributions. In this paper, we propose a novel Graph Condensation method via homophily node Refinement and fine-grained class Distribution matching (GCRD). Given the original large-scale graph, we first distinguish the nodes into advantageous homophilic nodes and detrimental heterophilic nodes, followed by adaptively assigning node weights to refine the generated class distribution patterns of the original graphs. Furthermore, with the refined class distribution patterns, we propose a fine-grained distribution matching objective to more delicately align the local distribution structure of subclasses within each class. The rigorous theoretical analysis confirms the effectiveness of our proposal in precisely learning the class information. Extensive experiments demonstrate our state-of-the-art classification and cross-architecture generalization performance against various baselines.
{"title":"Graph Condensation via Homophily Node Refining and Fine-Grained Distribution Matching.","authors":"Ruiwen Yuan,Yongqiang Tang,Wensheng Zhang","doi":"10.1109/tpami.2026.3672916","DOIUrl":"https://doi.org/10.1109/tpami.2026.3672916","url":null,"abstract":"The remarkable success of GNNs has provoked the challenge of high computational and memory overhead when training with large-scale graphs. As a promising solution, graph condensation is committed to constructing synthetic graphs with significantly smaller size, which are expected to preserve the essential characteristics of the original ones. During this process, a core problem is how to accurately portray and align the data distribution structures between the original graph space and the synthetic graph space. A mainstream idea in existing research is matching the class distributions between the two spaces. Unfortunately, they generally overlook two key issues: 1) heterophilic nodes in original graphs may render the chaotic class distribution patterns; 2) coarse-grained matching of the overall class centroid between original and synthetic spaces is insufficient for data with complex subcategory distributions. In this paper, we propose a novel Graph Condensation method via homophily node Refinement and fine-grained class Distribution matching (GCRD). Given the original large-scale graph, we first distinguish the nodes into advantageous homophilic nodes and detrimental heterophilic nodes, followed by adaptively assigning node weights to refine the generated class distribution patterns of the original graphs. Furthermore, with the refined class distribution patterns, we propose a fine-grained distribution matching objective to more delicately align the local distribution structure of subclasses within each class. The rigorous theoretical analysis confirms the effectiveness of our proposal in precisely learning the class information. Extensive experiments demonstrate our state-of-the-art classification and cross-architecture generalization performance against various baselines.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"13 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147490144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Knowledge distillation (KD) has become a fundamental technique for model compression in object detection tasks. The data noise and training randomness may cause the knowledge of the teacher model to be unreliable, referred to as knowledge uncertainty. Existing methods neglect this uncertainty, potentially hindering the student's capacity to capture and understand latent "dark knowledge". In this work, we introduce a novel strategy that explicitly incorporates knowledge uncertainty, named Uncertainty-Driven Knowledge Extraction and Transfer (UET). Given the unknown, high-dimensional nature of the knowledge distribution, we employ Monte Carlo dropout to effectively estimate the teacher's uncertainty. Leveraging information theory, we combine uncertainty with deterministic knowledge, enabling the student to benefit from both precision and diversity. UET is a plug-and-play method that integrates seamlessly with existing distillation techniques. We validate our approach through comprehensive experiments across various distillation strategies, detectors, and backbones. Specifically, UET achieves state-of-the-art results, with a ResNet50-based GFL detector obtaining 44.1% mAP on the COCO dataset-surpassing baseline performance by 3.9%.
{"title":"Distilling Object Detectors via Monte Carlo Dropout.","authors":"Junfei Yi,Hui Zhang,Jianxu Mao,Tengfei Liu,Mingjie Li,Sihao Lin,Hanyu Gu,Zhihui Li,Xiaojun Chang,Yaonan Wang","doi":"10.1109/tpami.2026.3674980","DOIUrl":"https://doi.org/10.1109/tpami.2026.3674980","url":null,"abstract":"Knowledge distillation (KD) has become a fundamental technique for model compression in object detection tasks. The data noise and training randomness may cause the knowledge of the teacher model to be unreliable, referred to as knowledge uncertainty. Existing methods neglect this uncertainty, potentially hindering the student's capacity to capture and understand latent \"dark knowledge\". In this work, we introduce a novel strategy that explicitly incorporates knowledge uncertainty, named Uncertainty-Driven Knowledge Extraction and Transfer (UET). Given the unknown, high-dimensional nature of the knowledge distribution, we employ Monte Carlo dropout to effectively estimate the teacher's uncertainty. Leveraging information theory, we combine uncertainty with deterministic knowledge, enabling the student to benefit from both precision and diversity. UET is a plug-and-play method that integrates seamlessly with existing distillation techniques. We validate our approach through comprehensive experiments across various distillation strategies, detectors, and backbones. Specifically, UET achieves state-of-the-art results, with a ResNet50-based GFL detector obtaining 44.1% mAP on the COCO dataset-surpassing baseline performance by 3.9%.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"11 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147478968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-17DOI: 10.1109/tpami.2026.3674984
Shitong Shao, Zikai Zhou, Dian Xie, Yuetong Fang, Tian Ye, Lichen Bai, Bo Han, Zeke Xie
{"title":"Improved and Accelerated Text-to-Image Generation with Collect, Reflect, and Refine","authors":"Shitong Shao, Zikai Zhou, Dian Xie, Yuetong Fang, Tian Ye, Lichen Bai, Bo Han, Zeke Xie","doi":"10.1109/tpami.2026.3674984","DOIUrl":"https://doi.org/10.1109/tpami.2026.3674984","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"79 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147471013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-17DOI: 10.1109/tpami.2026.3675022
Liping Deng, MingQing Xiao
{"title":"Bridging Datasets and Hyperparameters: GCN-Based Link Prediction for Recommendation","authors":"Liping Deng, MingQing Xiao","doi":"10.1109/tpami.2026.3675022","DOIUrl":"https://doi.org/10.1109/tpami.2026.3675022","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"52 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147471014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-17DOI: 10.1109/tpami.2026.3674742
Yuanwei Liu, Nian Liu, Tao Jiang, Yi Wu, Xiwen Yao, Junwei Han
{"title":"Beyond Support Samples: Incorporating Unlabeled Queries for Few-Shot Semantic Segmentation","authors":"Yuanwei Liu, Nian Liu, Tao Jiang, Yi Wu, Xiwen Yao, Junwei Han","doi":"10.1109/tpami.2026.3674742","DOIUrl":"https://doi.org/10.1109/tpami.2026.3674742","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"31 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147471015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}