Compared to traditional computed tomography (CT), photon-counting detector (PCD)-based CT provides significant advantages, including enhanced CT image contrast and reduced radiation dose. However, owing to the current immaturity of PCD technology, scanned PCD data often contain stripe artifacts resulting from non-functional or defective detector units, which subsequently introduce ring artifacts in reconstructed CT images. The presence of ring artifact may compromise the accuracy of CT values and even introduce pseudo-structures, thereby reducing the application value of CT images. In this paper, we propose a dual-domain optimization model that takes advantage of the distribution characteristics of stripe artifact in 3D projection data and the prior features of reconstructed 3D CT images. Specifically, we demonstrate that stripe artifact in 3D projection data exhibit both group sparsity and low-rank properties. Building on this observation, we propose a TLT (TV-l2,1- Tucker) model to eliminate ring artifact in PCD-based cone beam CT (CBCT). In addition, an efficient iterative algorithm is designed to solve the proposed model. The effectiveness of both the model and the algorithm is evaluated through simulated and real data experiments. Experimental results demonstrate that the proposed method outperforms existing state-of-the-art approaches.
{"title":"Dual Domain Optimization Algorithm for CBCT Ring Artifact Correction.","authors":"Yanwei Qin,Xiaohui Su,Xin Lu,Baodi Yu,Yunsong Zhao,Fanyong Meng","doi":"10.1109/tip.2026.3652008","DOIUrl":"https://doi.org/10.1109/tip.2026.3652008","url":null,"abstract":"Compared to traditional computed tomography (CT), photon-counting detector (PCD)-based CT provides significant advantages, including enhanced CT image contrast and reduced radiation dose. However, owing to the current immaturity of PCD technology, scanned PCD data often contain stripe artifacts resulting from non-functional or defective detector units, which subsequently introduce ring artifacts in reconstructed CT images. The presence of ring artifact may compromise the accuracy of CT values and even introduce pseudo-structures, thereby reducing the application value of CT images. In this paper, we propose a dual-domain optimization model that takes advantage of the distribution characteristics of stripe artifact in 3D projection data and the prior features of reconstructed 3D CT images. Specifically, we demonstrate that stripe artifact in 3D projection data exhibit both group sparsity and low-rank properties. Building on this observation, we propose a TLT (TV-l2,1- Tucker) model to eliminate ring artifact in PCD-based cone beam CT (CBCT). In addition, an efficient iterative algorithm is designed to solve the proposed model. The effectiveness of both the model and the algorithm is evaluated through simulated and real data experiments. Experimental results demonstrate that the proposed method outperforms existing state-of-the-art approaches.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"81 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1109/tip.2026.3651959
Paolo Giannitrapani, Elio D. Di Claudio, Giovanni Jacovitti
{"title":"BELE: Blur Equivalent Linearized Estimator","authors":"Paolo Giannitrapani, Elio D. Di Claudio, Giovanni Jacovitti","doi":"10.1109/tip.2026.3651959","DOIUrl":"https://doi.org/10.1109/tip.2026.3651959","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"37 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1109/tip.2026.3651956
Xin Jiang,Ziye Fang,Fei Shen,Junyao Gao,Zechao Li
Ultra-Fine-Grained Visual Categorization (Ultra-FGVC) aims to classify objects into sub-granular categories, presenting the challenge of distinguishing visually similar objects with limited data. Existing methods primarily address sample scarcity but often overlook the importance of leveraging intrinsic object features to construct highly discriminative representations. This limitation significantly constrains their effectiveness in Ultra-FGVC tasks. To address these challenges, we propose SV-Transformer that progressively encodes object features while incorporating background perturbation modeling to generate robust and discriminative representations. At the core of our approach is a progressive feature encoder, which hierarchically extracts global semantic structures and local discriminative details from backbone-generated representations. This design enhances inter-class separability while ensuring resilience to intra-class variations. Furthermore, our background perturbation learning mechanism introduces controlled variations in the feature space, effectively mitigating the impact of sample limitations and improving the model's capacity to capture fine-grained distinctions. Comprehensive experiments demonstrate that SV-Transformer achieves state-of-the-art performance on benchmark Ultra-FGVC datasets, showcasing its efficacy in addressing the challenges of Ultra-FGVC task.
{"title":"Progressive Feature Encoding with Background Perturbation Learning for Ultra-Fine-Grained Visual Categorization.","authors":"Xin Jiang,Ziye Fang,Fei Shen,Junyao Gao,Zechao Li","doi":"10.1109/tip.2026.3651956","DOIUrl":"https://doi.org/10.1109/tip.2026.3651956","url":null,"abstract":"Ultra-Fine-Grained Visual Categorization (Ultra-FGVC) aims to classify objects into sub-granular categories, presenting the challenge of distinguishing visually similar objects with limited data. Existing methods primarily address sample scarcity but often overlook the importance of leveraging intrinsic object features to construct highly discriminative representations. This limitation significantly constrains their effectiveness in Ultra-FGVC tasks. To address these challenges, we propose SV-Transformer that progressively encodes object features while incorporating background perturbation modeling to generate robust and discriminative representations. At the core of our approach is a progressive feature encoder, which hierarchically extracts global semantic structures and local discriminative details from backbone-generated representations. This design enhances inter-class separability while ensuring resilience to intra-class variations. Furthermore, our background perturbation learning mechanism introduces controlled variations in the feature space, effectively mitigating the impact of sample limitations and improving the model's capacity to capture fine-grained distinctions. Comprehensive experiments demonstrate that SV-Transformer achieves state-of-the-art performance on benchmark Ultra-FGVC datasets, showcasing its efficacy in addressing the challenges of Ultra-FGVC task.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"391 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Confined spaces refer to partially or fully enclosed areas, e.g., sewage wells, where working conditions pose significant risks to the workers. The evaluation of COfined Space Operational Safety (COSOS) refers to verifying whether workers are properly equipped with safety equipment before entering a confined space, which is crucial for protecting their safety and health. Due to the crowded nature of such environments and the small size of certain safety equipment, existing methods face significant challenges. Moreover, there is a lack of dedicated datasets to support research in this domain. In this paper, in order to advance research in this challenging task, we present COSOS-1k, an extensive dataset constructed from diverse confined space scenarios. It comprises multi-view videos for each scenario, covers 10 essential safety protective equipments and 6 attributes of worker, and is annotated with expressive object locations, fine-grained attributes, and occlusion status. The COSOS-1k is the first dataset known to date, tailored explicitly for the real-world COSOS scenarios. In addition, we address the challenge of occlusion from three perspectives: instance, video, and view. Firstly, at the instance level, we propose Occlusion-aware Uncertainty Estimation (OUE) method, which leverages box-level occlusion annotations to enable part-level occlusion prediction for objects. Secondly, at the video level, we introduce Cross-Frame Cluster (CFC) attention, which integrates temporal context features from the same object category to mitigate the impact of occlusions in the current frame. Finally, we extend CFC to the view level and form Cross-View Cluster (CVC) attention, where complementary information is mined from another view. Extensive experiments demonstrate the effectiveness of the proposed methods and provide insights into the importance of dataset diversity and expressivity. The COSOS-1k dataset and code are available at https://github.com/deepalchemist/cosos-1k.
{"title":"COSOS-1k: A Benchmark Dataset and Occlusion-aware Uncertainty Learning for Multi-view Video Object Detection.","authors":"Wenjie Yang,Yueying Kao,Tong Liu,Yuanlong Yu,Kaiqi Huang","doi":"10.1109/tip.2026.3651950","DOIUrl":"https://doi.org/10.1109/tip.2026.3651950","url":null,"abstract":"Confined spaces refer to partially or fully enclosed areas, e.g., sewage wells, where working conditions pose significant risks to the workers. The evaluation of COfined Space Operational Safety (COSOS) refers to verifying whether workers are properly equipped with safety equipment before entering a confined space, which is crucial for protecting their safety and health. Due to the crowded nature of such environments and the small size of certain safety equipment, existing methods face significant challenges. Moreover, there is a lack of dedicated datasets to support research in this domain. In this paper, in order to advance research in this challenging task, we present COSOS-1k, an extensive dataset constructed from diverse confined space scenarios. It comprises multi-view videos for each scenario, covers 10 essential safety protective equipments and 6 attributes of worker, and is annotated with expressive object locations, fine-grained attributes, and occlusion status. The COSOS-1k is the first dataset known to date, tailored explicitly for the real-world COSOS scenarios. In addition, we address the challenge of occlusion from three perspectives: instance, video, and view. Firstly, at the instance level, we propose Occlusion-aware Uncertainty Estimation (OUE) method, which leverages box-level occlusion annotations to enable part-level occlusion prediction for objects. Secondly, at the video level, we introduce Cross-Frame Cluster (CFC) attention, which integrates temporal context features from the same object category to mitigate the impact of occlusions in the current frame. Finally, we extend CFC to the view level and form Cross-View Cluster (CVC) attention, where complementary information is mined from another view. Extensive experiments demonstrate the effectiveness of the proposed methods and provide insights into the importance of dataset diversity and expressivity. The COSOS-1k dataset and code are available at https://github.com/deepalchemist/cosos-1k.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"26 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1109/tip.2026.3651963
Weimin Bai,Siyi Chen,Wenzheng Chen,He Sun
Diffusion models have emerged as powerful tools for solving inverse problems due to their exceptional ability to model complex prior distributions. However, existing methods predominantly assume known forward operators (i.e., non-blind), limiting their applicability in practical settings where acquiring such operators is costly. Additionally, many current approaches rely on pixel-space diffusion models, leaving the potential of more powerful latent diffusion models (LDMs) underexplored. In this paper, we introduce LatentDEM, an innovative technique that addresses more challenging blind inverse problems using latent diffusion priors. At the core of our method is solving blind inverse problems within an iterative Expectation-Maximization (EM) framework: (1) the E-step recovers clean images from corrupted observations using LDM priors and a known forward model, and (2) the M-step estimates the forward operator based on the recovered images. Additionally, we propose two novel optimization techniques tailored for LDM priors and EM frameworks, yielding more accurate and efficient blind inversion results. As a general framework, LatentDEM supports both linear and non-linear inverse problems. Beyond common 2D image restoration tasks, it enables new capabilities in non-linear 3D inverse rendering problems. We validate LatentDEM's performance on representative 2D blind deblurring and 3D pose-free sparse-view reconstruction tasks, demonstrating its superior efficacy over prior arts. The project page can be found at https://ai4imaging.github.io/latentdem/.
{"title":"Blind Inversion using Latent Diffusion Priors.","authors":"Weimin Bai,Siyi Chen,Wenzheng Chen,He Sun","doi":"10.1109/tip.2026.3651963","DOIUrl":"https://doi.org/10.1109/tip.2026.3651963","url":null,"abstract":"Diffusion models have emerged as powerful tools for solving inverse problems due to their exceptional ability to model complex prior distributions. However, existing methods predominantly assume known forward operators (i.e., non-blind), limiting their applicability in practical settings where acquiring such operators is costly. Additionally, many current approaches rely on pixel-space diffusion models, leaving the potential of more powerful latent diffusion models (LDMs) underexplored. In this paper, we introduce LatentDEM, an innovative technique that addresses more challenging blind inverse problems using latent diffusion priors. At the core of our method is solving blind inverse problems within an iterative Expectation-Maximization (EM) framework: (1) the E-step recovers clean images from corrupted observations using LDM priors and a known forward model, and (2) the M-step estimates the forward operator based on the recovered images. Additionally, we propose two novel optimization techniques tailored for LDM priors and EM frameworks, yielding more accurate and efficient blind inversion results. As a general framework, LatentDEM supports both linear and non-linear inverse problems. Beyond common 2D image restoration tasks, it enables new capabilities in non-linear 3D inverse rendering problems. We validate LatentDEM's performance on representative 2D blind deblurring and 3D pose-free sparse-view reconstruction tasks, demonstrating its superior efficacy over prior arts. The project page can be found at https://ai4imaging.github.io/latentdem/.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"259 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1109/tip.2026.3651981
Yan Guo,Chenyao Li,Haitao Li,Weiwen Ge,Bolun Zeng,Jiaxuan Liu,Tianhao Wan,Shanyong Zhang,Xiaojun Chen
Orthognathic surgery demands precise preoperative planning to achieve optimal functional and aesthetic results, yet current practices remain labor-intensive and highly dependent on surgical expertise. To address these challenge, we propose OrthoPlanner, a novel two-stage framework for automated orthognathic surgical planning. In the first stage, we develop JawFormer, a shape sensitive transformer network that predicts postoperative bone morphology directly from preoperative 3D point cloud data. Built upon a point cloud encoder-decoder architecture, the network integrates anatomical priors through a region-based feature alignment module. This enables precise modeling of structural changes while preserving critical anatomical features. In the second stage, we introduce a symmetry-constrained rigid alignment algorithm that automatically outputs the precise translation and rotation of each osteotomized bone segment required to match the predicted morphology. This ensures bilateral anatomical consistency and facilitates interpretable surgical plans. Compared with existing approaches, our method achieves superior quantitative performance and enhanced visualization results, as demonstrated by 65 experiments on real clinical datasets. Moreover, OrthoPlanner significantly reduces planning time and manual workload, while ensuring reproducible and clinically acceptable outcomes.
{"title":"Automated Orthognathic Surgery Planning based on Shape-Aware Morphology Prediction and Anatomy-Constrained Registration.","authors":"Yan Guo,Chenyao Li,Haitao Li,Weiwen Ge,Bolun Zeng,Jiaxuan Liu,Tianhao Wan,Shanyong Zhang,Xiaojun Chen","doi":"10.1109/tip.2026.3651981","DOIUrl":"https://doi.org/10.1109/tip.2026.3651981","url":null,"abstract":"Orthognathic surgery demands precise preoperative planning to achieve optimal functional and aesthetic results, yet current practices remain labor-intensive and highly dependent on surgical expertise. To address these challenge, we propose OrthoPlanner, a novel two-stage framework for automated orthognathic surgical planning. In the first stage, we develop JawFormer, a shape sensitive transformer network that predicts postoperative bone morphology directly from preoperative 3D point cloud data. Built upon a point cloud encoder-decoder architecture, the network integrates anatomical priors through a region-based feature alignment module. This enables precise modeling of structural changes while preserving critical anatomical features. In the second stage, we introduce a symmetry-constrained rigid alignment algorithm that automatically outputs the precise translation and rotation of each osteotomized bone segment required to match the predicted morphology. This ensures bilateral anatomical consistency and facilitates interpretable surgical plans. Compared with existing approaches, our method achieves superior quantitative performance and enhanced visualization results, as demonstrated by 65 experiments on real clinical datasets. Moreover, OrthoPlanner significantly reduces planning time and manual workload, while ensuring reproducible and clinically acceptable outcomes.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"54 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}