Pub Date : 2025-12-19DOI: 10.1109/tpami.2025.3646223
Qunliang Xing, Ce Zheng, Mai Xu, Jing Yang, Shengxi Li
{"title":"Breaking the Multi-Enhancement Bottleneck: Domain-Consistent Quality Enhancement for Compressed Images","authors":"Qunliang Xing, Ce Zheng, Mai Xu, Jing Yang, Shengxi Li","doi":"10.1109/tpami.2025.3646223","DOIUrl":"https://doi.org/10.1109/tpami.2025.3646223","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"17 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145785056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.1109/tpami.2025.3646452
Chong Wu, Maolin Che, Hong Yan
{"title":"The CUR Decomposition of Self-Attention Matrices in Vision Transformers","authors":"Chong Wu, Maolin Che, Hong Yan","doi":"10.1109/tpami.2025.3646452","DOIUrl":"https://doi.org/10.1109/tpami.2025.3646452","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"19 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145785064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.1109/tpami.2025.3646548
Pu Cao, Feng Zhou, Qing Song, Lu Yang
{"title":"Controllable Generation with Text-to-Image Diffusion Models: a Survey","authors":"Pu Cao, Feng Zhou, Qing Song, Lu Yang","doi":"10.1109/tpami.2025.3646548","DOIUrl":"https://doi.org/10.1109/tpami.2025.3646548","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"22 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145785066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.1109/tpami.2025.3646473
Guangchi Fang, Bing Wang
{"title":"Efficient Scene Modeling Via Structure-Aware and Region-Prioritized 3D Gaussians","authors":"Guangchi Fang, Bing Wang","doi":"10.1109/tpami.2025.3646473","DOIUrl":"https://doi.org/10.1109/tpami.2025.3646473","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"17 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145785063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-18DOI: 10.1109/tpami.2025.3645918
Yanghong Liu,Xingping Dong,Yutian Lin,Mang Ye,Kaihao Zhang,Bo Du
Pedestrian behavior exhibits inherent multi-modality, necessitating predictions that balance accuracy and diversity to adapt effectively to various complex scenarios. However, conventional noise addition in diffusion models is often aimless and unguided, leading to redundant noise reduction steps and the generation of uncontrollable samples. To address these issues, we propose a Prior Condition-Guided Diffusion Model (CGD-TraP) for multi-modal pedestrian trajectory prediction. Instead of directly adding Gaussian noise to trajectories at each timestep during the forward process, our approach leverages internal intention and external interaction to guide noise estimation. Specifically, we design two specialized modules to extract and aggregate intention and interaction features. These features are then adaptively fused through a spatial-temporal fusion based on selective state space, which estimates a controllable noisy trajectory distribution. By optimizing the noise addition process in a more controlled and efficient manner, our method ensures that the denoising process is effectively guided, resulting in predictions that are both accurate and diverse. Extensive experiments on the ETH-UCY, SDD, and NBA datasets demonstrate that CGD-TraP surpasses state-of-the-art diffusion-based and other generative methods, achieving superior efficiency, accuracy, and diversity.
{"title":"Condition-Guided Diffusion for Multi-Modal Pedestrian Trajectory Prediction Incorporating Intention and Interaction Priors.","authors":"Yanghong Liu,Xingping Dong,Yutian Lin,Mang Ye,Kaihao Zhang,Bo Du","doi":"10.1109/tpami.2025.3645918","DOIUrl":"https://doi.org/10.1109/tpami.2025.3645918","url":null,"abstract":"Pedestrian behavior exhibits inherent multi-modality, necessitating predictions that balance accuracy and diversity to adapt effectively to various complex scenarios. However, conventional noise addition in diffusion models is often aimless and unguided, leading to redundant noise reduction steps and the generation of uncontrollable samples. To address these issues, we propose a Prior Condition-Guided Diffusion Model (CGD-TraP) for multi-modal pedestrian trajectory prediction. Instead of directly adding Gaussian noise to trajectories at each timestep during the forward process, our approach leverages internal intention and external interaction to guide noise estimation. Specifically, we design two specialized modules to extract and aggregate intention and interaction features. These features are then adaptively fused through a spatial-temporal fusion based on selective state space, which estimates a controllable noisy trajectory distribution. By optimizing the noise addition process in a more controlled and efficient manner, our method ensures that the denoising process is effectively guided, resulting in predictions that are both accurate and diverse. Extensive experiments on the ETH-UCY, SDD, and NBA datasets demonstrate that CGD-TraP surpasses state-of-the-art diffusion-based and other generative methods, achieving superior efficiency, accuracy, and diversity.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"159 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145777301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-18DOI: 10.1109/tpami.2025.3646002
Carlos Garrido-Munoz, Antonio Rios-Vila, Jorge Calvo-Zaragoza
{"title":"Handwritten Text Recognition: A Survey","authors":"Carlos Garrido-Munoz, Antonio Rios-Vila, Jorge Calvo-Zaragoza","doi":"10.1109/tpami.2025.3646002","DOIUrl":"https://doi.org/10.1109/tpami.2025.3646002","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"10 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145777824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}