首页 > 最新文献

IEEE Transactions on Image Processing最新文献

英文 中文
IEEE Transactions on Image Processing publication information IEEE图像处理汇刊信息
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-15 DOI: 10.1109/tip.2026.3651208
{"title":"IEEE Transactions on Image Processing publication information","authors":"","doi":"10.1109/tip.2026.3651208","DOIUrl":"https://doi.org/10.1109/tip.2026.3651208","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"58 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145972026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Boosting Segment Anything Model to Generalize Visually Non-Salient Scenarios 增强细分任何模型来概括视觉上不显著的场景
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-15 DOI: 10.1109/tip.2026.3651951
Guangqian Guo, Pengfei Chen, Yong Guo, Huafeng Chen, Boqiang Zhang, Shan Gao
{"title":"Boosting Segment Anything Model to Generalize Visually Non-Salient Scenarios","authors":"Guangqian Guo, Pengfei Chen, Yong Guo, Huafeng Chen, Boqiang Zhang, Shan Gao","doi":"10.1109/tip.2026.3651951","DOIUrl":"https://doi.org/10.1109/tip.2026.3651951","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"26 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145972025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual Domain Optimization Algorithm for CBCT Ring Artifact Correction. CBCT环伪影校正的对偶域优化算法。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-15 DOI: 10.1109/tip.2026.3652008
Yanwei Qin,Xiaohui Su,Xin Lu,Baodi Yu,Yunsong Zhao,Fanyong Meng
Compared to traditional computed tomography (CT), photon-counting detector (PCD)-based CT provides significant advantages, including enhanced CT image contrast and reduced radiation dose. However, owing to the current immaturity of PCD technology, scanned PCD data often contain stripe artifacts resulting from non-functional or defective detector units, which subsequently introduce ring artifacts in reconstructed CT images. The presence of ring artifact may compromise the accuracy of CT values and even introduce pseudo-structures, thereby reducing the application value of CT images. In this paper, we propose a dual-domain optimization model that takes advantage of the distribution characteristics of stripe artifact in 3D projection data and the prior features of reconstructed 3D CT images. Specifically, we demonstrate that stripe artifact in 3D projection data exhibit both group sparsity and low-rank properties. Building on this observation, we propose a TLT (TV-l2,1- Tucker) model to eliminate ring artifact in PCD-based cone beam CT (CBCT). In addition, an efficient iterative algorithm is designed to solve the proposed model. The effectiveness of both the model and the algorithm is evaluated through simulated and real data experiments. Experimental results demonstrate that the proposed method outperforms existing state-of-the-art approaches.
与传统的计算机断层扫描(CT)相比,基于光子计数检测器(PCD)的CT具有显著的优势,包括增强CT图像对比度和降低辐射剂量。然而,由于目前PCD技术的不成熟,扫描的PCD数据经常包含由于检测器单元不功能或缺陷而导致的条纹伪影,这些伪影随后在重建的CT图像中引入环状伪影。环形伪影的存在可能会影响CT值的准确性,甚至引入伪结构,从而降低CT图像的应用价值。本文提出了一种利用三维投影数据中条纹伪影的分布特征和三维CT重建图像的先验特征的双域优化模型。具体来说,我们证明了三维投影数据中的条纹伪影具有群稀疏性和低秩性。基于这一观察结果,我们提出了一种TLT (TV-l2,1- Tucker)模型来消除基于pcd的锥束CT (CBCT)中的环伪影。此外,还设计了一种高效的迭代算法来求解所提出的模型。通过仿真和实际数据实验,对模型和算法的有效性进行了评价。实验结果表明,该方法优于现有的最先进的方法。
{"title":"Dual Domain Optimization Algorithm for CBCT Ring Artifact Correction.","authors":"Yanwei Qin,Xiaohui Su,Xin Lu,Baodi Yu,Yunsong Zhao,Fanyong Meng","doi":"10.1109/tip.2026.3652008","DOIUrl":"https://doi.org/10.1109/tip.2026.3652008","url":null,"abstract":"Compared to traditional computed tomography (CT), photon-counting detector (PCD)-based CT provides significant advantages, including enhanced CT image contrast and reduced radiation dose. However, owing to the current immaturity of PCD technology, scanned PCD data often contain stripe artifacts resulting from non-functional or defective detector units, which subsequently introduce ring artifacts in reconstructed CT images. The presence of ring artifact may compromise the accuracy of CT values and even introduce pseudo-structures, thereby reducing the application value of CT images. In this paper, we propose a dual-domain optimization model that takes advantage of the distribution characteristics of stripe artifact in 3D projection data and the prior features of reconstructed 3D CT images. Specifically, we demonstrate that stripe artifact in 3D projection data exhibit both group sparsity and low-rank properties. Building on this observation, we propose a TLT (TV-l2,1- Tucker) model to eliminate ring artifact in PCD-based cone beam CT (CBCT). In addition, an efficient iterative algorithm is designed to solve the proposed model. The effectiveness of both the model and the algorithm is evaluated through simulated and real data experiments. Experimental results demonstrate that the proposed method outperforms existing state-of-the-art approaches.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"81 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BELE: Blur Equivalent Linearized Estimator 模糊等效线性化估计器
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1109/tip.2026.3651959
Paolo Giannitrapani, Elio D. Di Claudio, Giovanni Jacovitti
{"title":"BELE: Blur Equivalent Linearized Estimator","authors":"Paolo Giannitrapani, Elio D. Di Claudio, Giovanni Jacovitti","doi":"10.1109/tip.2026.3651959","DOIUrl":"https://doi.org/10.1109/tip.2026.3651959","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"37 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning 面向持续学习的视觉语言模型的多阶段知识集成
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1109/tip.2026.3652014
Hongsheng Zhang, Zhong Ji, Jingren Liu, Yanwei Pang, Jungong Han
{"title":"Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning","authors":"Hongsheng Zhang, Zhong Ji, Jingren Liu, Yanwei Pang, Jungong Han","doi":"10.1109/tip.2026.3652014","DOIUrl":"https://doi.org/10.1109/tip.2026.3652014","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"12378 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Progressive Feature Encoding with Background Perturbation Learning for Ultra-Fine-Grained Visual Categorization. 基于背景扰动学习的渐进式特征编码用于超细粒度视觉分类。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1109/tip.2026.3651956
Xin Jiang,Ziye Fang,Fei Shen,Junyao Gao,Zechao Li
Ultra-Fine-Grained Visual Categorization (Ultra-FGVC) aims to classify objects into sub-granular categories, presenting the challenge of distinguishing visually similar objects with limited data. Existing methods primarily address sample scarcity but often overlook the importance of leveraging intrinsic object features to construct highly discriminative representations. This limitation significantly constrains their effectiveness in Ultra-FGVC tasks. To address these challenges, we propose SV-Transformer that progressively encodes object features while incorporating background perturbation modeling to generate robust and discriminative representations. At the core of our approach is a progressive feature encoder, which hierarchically extracts global semantic structures and local discriminative details from backbone-generated representations. This design enhances inter-class separability while ensuring resilience to intra-class variations. Furthermore, our background perturbation learning mechanism introduces controlled variations in the feature space, effectively mitigating the impact of sample limitations and improving the model's capacity to capture fine-grained distinctions. Comprehensive experiments demonstrate that SV-Transformer achieves state-of-the-art performance on benchmark Ultra-FGVC datasets, showcasing its efficacy in addressing the challenges of Ultra-FGVC task.
超细粒度视觉分类(ultra -细粒度Visual Categorization, Ultra-FGVC)旨在将物体分类为亚颗粒类别,这给在有限数据下区分视觉上相似的物体带来了挑战。现有方法主要解决样本稀缺性问题,但往往忽视了利用内在对象特征构建高度判别表征的重要性。这一限制极大地限制了它们在Ultra-FGVC任务中的有效性。为了解决这些挑战,我们提出了SV-Transformer,它在结合背景扰动建模的同时逐步编码对象特征,以生成鲁棒和鉴别表示。我们方法的核心是一个渐进式特征编码器,它分层地从主干生成的表示中提取全局语义结构和局部判别细节。这种设计增强了类间的可分离性,同时确保了对类内变化的弹性。此外,我们的背景扰动学习机制在特征空间中引入了可控的变化,有效地减轻了样本限制的影响,提高了模型捕捉细粒度差异的能力。综合实验表明,SV-Transformer在基准Ultra-FGVC数据集上实现了最先进的性能,展示了其在解决Ultra-FGVC任务挑战方面的有效性。
{"title":"Progressive Feature Encoding with Background Perturbation Learning for Ultra-Fine-Grained Visual Categorization.","authors":"Xin Jiang,Ziye Fang,Fei Shen,Junyao Gao,Zechao Li","doi":"10.1109/tip.2026.3651956","DOIUrl":"https://doi.org/10.1109/tip.2026.3651956","url":null,"abstract":"Ultra-Fine-Grained Visual Categorization (Ultra-FGVC) aims to classify objects into sub-granular categories, presenting the challenge of distinguishing visually similar objects with limited data. Existing methods primarily address sample scarcity but often overlook the importance of leveraging intrinsic object features to construct highly discriminative representations. This limitation significantly constrains their effectiveness in Ultra-FGVC tasks. To address these challenges, we propose SV-Transformer that progressively encodes object features while incorporating background perturbation modeling to generate robust and discriminative representations. At the core of our approach is a progressive feature encoder, which hierarchically extracts global semantic structures and local discriminative details from backbone-generated representations. This design enhances inter-class separability while ensuring resilience to intra-class variations. Furthermore, our background perturbation learning mechanism introduces controlled variations in the feature space, effectively mitigating the impact of sample limitations and improving the model's capacity to capture fine-grained distinctions. Comprehensive experiments demonstrate that SV-Transformer achieves state-of-the-art performance on benchmark Ultra-FGVC datasets, showcasing its efficacy in addressing the challenges of Ultra-FGVC task.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"391 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
COSOS-1k: A Benchmark Dataset and Occlusion-aware Uncertainty Learning for Multi-view Video Object Detection. COSOS-1k:用于多视点视频目标检测的基准数据集和闭塞感知不确定性学习。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1109/tip.2026.3651950
Wenjie Yang,Yueying Kao,Tong Liu,Yuanlong Yu,Kaiqi Huang
Confined spaces refer to partially or fully enclosed areas, e.g., sewage wells, where working conditions pose significant risks to the workers. The evaluation of COfined Space Operational Safety (COSOS) refers to verifying whether workers are properly equipped with safety equipment before entering a confined space, which is crucial for protecting their safety and health. Due to the crowded nature of such environments and the small size of certain safety equipment, existing methods face significant challenges. Moreover, there is a lack of dedicated datasets to support research in this domain. In this paper, in order to advance research in this challenging task, we present COSOS-1k, an extensive dataset constructed from diverse confined space scenarios. It comprises multi-view videos for each scenario, covers 10 essential safety protective equipments and 6 attributes of worker, and is annotated with expressive object locations, fine-grained attributes, and occlusion status. The COSOS-1k is the first dataset known to date, tailored explicitly for the real-world COSOS scenarios. In addition, we address the challenge of occlusion from three perspectives: instance, video, and view. Firstly, at the instance level, we propose Occlusion-aware Uncertainty Estimation (OUE) method, which leverages box-level occlusion annotations to enable part-level occlusion prediction for objects. Secondly, at the video level, we introduce Cross-Frame Cluster (CFC) attention, which integrates temporal context features from the same object category to mitigate the impact of occlusions in the current frame. Finally, we extend CFC to the view level and form Cross-View Cluster (CVC) attention, where complementary information is mined from another view. Extensive experiments demonstrate the effectiveness of the proposed methods and provide insights into the importance of dataset diversity and expressivity. The COSOS-1k dataset and code are available at https://github.com/deepalchemist/cosos-1k.
密闭空间是指部分或完全封闭的区域,例如污水井,在那里工作条件对工人构成重大危险。密闭空间作业安全评价是指在进入密闭空间之前,核实工作人员是否配备了适当的安全设备,这对保护工作人员的安全和健康至关重要。由于这种环境的拥挤性质和某些安全设备的小尺寸,现有的方法面临重大挑战。此外,缺乏专门的数据集来支持这一领域的研究。在本文中,为了推进这一具有挑战性的任务的研究,我们提出了COSOS-1k,这是一个由不同密闭空间场景构建的广泛数据集。它包含每个场景的多视图视频,涵盖了10个基本安全防护装备和工人的6个属性,并标注了具有表达性的对象位置、细粒度属性和遮挡状态。COSOS-1k是迄今为止已知的第一个数据集,专门为现实世界的COSOS场景量身定制。此外,我们从实例、视频和视图三个角度解决了遮挡的挑战。首先,在实例级,我们提出了闭塞感知的不确定性估计(OUE)方法,该方法利用盒级闭塞注释实现对物体的局部闭塞预测。其次,在视频级别,我们引入了跨帧聚类(Cross-Frame Cluster, CFC)注意力,它集成了来自同一对象类别的时间上下文特征,以减轻当前帧中遮挡的影响。最后,我们将CFC扩展到视图级别,形成跨视图聚类(Cross-View Cluster, CVC)注意,其中从另一个视图中挖掘互补信息。大量的实验证明了所提出方法的有效性,并提供了对数据集多样性和表达性重要性的见解。COSOS-1k数据集和代码可在https://github.com/deepalchemist/cosos-1k上获得。
{"title":"COSOS-1k: A Benchmark Dataset and Occlusion-aware Uncertainty Learning for Multi-view Video Object Detection.","authors":"Wenjie Yang,Yueying Kao,Tong Liu,Yuanlong Yu,Kaiqi Huang","doi":"10.1109/tip.2026.3651950","DOIUrl":"https://doi.org/10.1109/tip.2026.3651950","url":null,"abstract":"Confined spaces refer to partially or fully enclosed areas, e.g., sewage wells, where working conditions pose significant risks to the workers. The evaluation of COfined Space Operational Safety (COSOS) refers to verifying whether workers are properly equipped with safety equipment before entering a confined space, which is crucial for protecting their safety and health. Due to the crowded nature of such environments and the small size of certain safety equipment, existing methods face significant challenges. Moreover, there is a lack of dedicated datasets to support research in this domain. In this paper, in order to advance research in this challenging task, we present COSOS-1k, an extensive dataset constructed from diverse confined space scenarios. It comprises multi-view videos for each scenario, covers 10 essential safety protective equipments and 6 attributes of worker, and is annotated with expressive object locations, fine-grained attributes, and occlusion status. The COSOS-1k is the first dataset known to date, tailored explicitly for the real-world COSOS scenarios. In addition, we address the challenge of occlusion from three perspectives: instance, video, and view. Firstly, at the instance level, we propose Occlusion-aware Uncertainty Estimation (OUE) method, which leverages box-level occlusion annotations to enable part-level occlusion prediction for objects. Secondly, at the video level, we introduce Cross-Frame Cluster (CFC) attention, which integrates temporal context features from the same object category to mitigate the impact of occlusions in the current frame. Finally, we extend CFC to the view level and form Cross-View Cluster (CVC) attention, where complementary information is mined from another view. Extensive experiments demonstrate the effectiveness of the proposed methods and provide insights into the importance of dataset diversity and expressivity. The COSOS-1k dataset and code are available at https://github.com/deepalchemist/cosos-1k.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"26 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Blind Inversion using Latent Diffusion Priors. 基于潜在扩散先验的盲反演。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1109/tip.2026.3651963
Weimin Bai,Siyi Chen,Wenzheng Chen,He Sun
Diffusion models have emerged as powerful tools for solving inverse problems due to their exceptional ability to model complex prior distributions. However, existing methods predominantly assume known forward operators (i.e., non-blind), limiting their applicability in practical settings where acquiring such operators is costly. Additionally, many current approaches rely on pixel-space diffusion models, leaving the potential of more powerful latent diffusion models (LDMs) underexplored. In this paper, we introduce LatentDEM, an innovative technique that addresses more challenging blind inverse problems using latent diffusion priors. At the core of our method is solving blind inverse problems within an iterative Expectation-Maximization (EM) framework: (1) the E-step recovers clean images from corrupted observations using LDM priors and a known forward model, and (2) the M-step estimates the forward operator based on the recovered images. Additionally, we propose two novel optimization techniques tailored for LDM priors and EM frameworks, yielding more accurate and efficient blind inversion results. As a general framework, LatentDEM supports both linear and non-linear inverse problems. Beyond common 2D image restoration tasks, it enables new capabilities in non-linear 3D inverse rendering problems. We validate LatentDEM's performance on representative 2D blind deblurring and 3D pose-free sparse-view reconstruction tasks, demonstrating its superior efficacy over prior arts. The project page can be found at https://ai4imaging.github.io/latentdem/.
扩散模型已经成为求解逆问题的强大工具,因为它们具有模拟复杂先验分布的卓越能力。然而,现有的方法主要假设已知的前向算子(即非盲算子),限制了它们在实际环境中的适用性,因为获取这些算子的成本很高。此外,许多当前的方法依赖于像素空间扩散模型,使得更强大的潜在扩散模型(ldm)的潜力未得到充分开发。在本文中,我们介绍了LatentDEM,这是一种利用潜在扩散先验来解决更具挑战性的盲逆问题的创新技术。该方法的核心是在迭代期望最大化(EM)框架内解决盲逆问题:(1)e步使用LDM先验和已知的正演模型从损坏的观测中恢复干净图像,(2)m步根据恢复的图像估计正演算子。此外,我们提出了两种针对LDM先验和EM框架量身定制的新型优化技术,从而产生更准确和高效的盲反演结果。作为一般框架,LatentDEM支持线性和非线性反问题。除了常见的2D图像恢复任务,它还支持非线性3D逆渲染问题的新功能。我们验证了LatentDEM在代表性2D盲去模糊和3D无姿态稀疏视图重建任务上的性能,证明了其优于现有技术的有效性。项目页面可以在https://ai4imaging.github.io/latentdem/上找到。
{"title":"Blind Inversion using Latent Diffusion Priors.","authors":"Weimin Bai,Siyi Chen,Wenzheng Chen,He Sun","doi":"10.1109/tip.2026.3651963","DOIUrl":"https://doi.org/10.1109/tip.2026.3651963","url":null,"abstract":"Diffusion models have emerged as powerful tools for solving inverse problems due to their exceptional ability to model complex prior distributions. However, existing methods predominantly assume known forward operators (i.e., non-blind), limiting their applicability in practical settings where acquiring such operators is costly. Additionally, many current approaches rely on pixel-space diffusion models, leaving the potential of more powerful latent diffusion models (LDMs) underexplored. In this paper, we introduce LatentDEM, an innovative technique that addresses more challenging blind inverse problems using latent diffusion priors. At the core of our method is solving blind inverse problems within an iterative Expectation-Maximization (EM) framework: (1) the E-step recovers clean images from corrupted observations using LDM priors and a known forward model, and (2) the M-step estimates the forward operator based on the recovered images. Additionally, we propose two novel optimization techniques tailored for LDM priors and EM frameworks, yielding more accurate and efficient blind inversion results. As a general framework, LatentDEM supports both linear and non-linear inverse problems. Beyond common 2D image restoration tasks, it enables new capabilities in non-linear 3D inverse rendering problems. We validate LatentDEM's performance on representative 2D blind deblurring and 3D pose-free sparse-view reconstruction tasks, demonstrating its superior efficacy over prior arts. The project page can be found at https://ai4imaging.github.io/latentdem/.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"259 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated Orthognathic Surgery Planning based on Shape-Aware Morphology Prediction and Anatomy-Constrained Registration. 基于形状感知形态学预测和解剖约束配准的自动正颌手术计划。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1109/tip.2026.3651981
Yan Guo,Chenyao Li,Haitao Li,Weiwen Ge,Bolun Zeng,Jiaxuan Liu,Tianhao Wan,Shanyong Zhang,Xiaojun Chen
Orthognathic surgery demands precise preoperative planning to achieve optimal functional and aesthetic results, yet current practices remain labor-intensive and highly dependent on surgical expertise. To address these challenge, we propose OrthoPlanner, a novel two-stage framework for automated orthognathic surgical planning. In the first stage, we develop JawFormer, a shape sensitive transformer network that predicts postoperative bone morphology directly from preoperative 3D point cloud data. Built upon a point cloud encoder-decoder architecture, the network integrates anatomical priors through a region-based feature alignment module. This enables precise modeling of structural changes while preserving critical anatomical features. In the second stage, we introduce a symmetry-constrained rigid alignment algorithm that automatically outputs the precise translation and rotation of each osteotomized bone segment required to match the predicted morphology. This ensures bilateral anatomical consistency and facilitates interpretable surgical plans. Compared with existing approaches, our method achieves superior quantitative performance and enhanced visualization results, as demonstrated by 65 experiments on real clinical datasets. Moreover, OrthoPlanner significantly reduces planning time and manual workload, while ensuring reproducible and clinically acceptable outcomes.
正颌手术需要精确的术前计划,以达到最佳的功能和美学效果,但目前的做法仍然是劳动密集型和高度依赖外科专业知识。为了解决这些挑战,我们提出了OrthoPlanner,这是一个新的两阶段框架,用于自动正颌手术计划。在第一阶段,我们开发了JawFormer,这是一种形状敏感的变压器网络,可以直接从术前3D点云数据预测术后骨形态。该网络基于点云编码器-解码器架构,通过基于区域的特征对齐模块集成解剖先验。这使得结构变化的精确建模,同时保留关键的解剖特征。在第二阶段,我们引入了一种对称约束的刚性对齐算法,该算法自动输出每个截骨骨段的精确平移和旋转,以匹配预测的形态学。这确保了双侧解剖的一致性,并促进了可解释的手术计划。与现有方法相比,我们的方法具有更好的定量性能和增强的可视化结果,并通过65个真实临床数据集的实验证明了这一点。此外,OrthoPlanner显著减少了计划时间和人工工作量,同时确保了可重复性和临床可接受的结果。
{"title":"Automated Orthognathic Surgery Planning based on Shape-Aware Morphology Prediction and Anatomy-Constrained Registration.","authors":"Yan Guo,Chenyao Li,Haitao Li,Weiwen Ge,Bolun Zeng,Jiaxuan Liu,Tianhao Wan,Shanyong Zhang,Xiaojun Chen","doi":"10.1109/tip.2026.3651981","DOIUrl":"https://doi.org/10.1109/tip.2026.3651981","url":null,"abstract":"Orthognathic surgery demands precise preoperative planning to achieve optimal functional and aesthetic results, yet current practices remain labor-intensive and highly dependent on surgical expertise. To address these challenge, we propose OrthoPlanner, a novel two-stage framework for automated orthognathic surgical planning. In the first stage, we develop JawFormer, a shape sensitive transformer network that predicts postoperative bone morphology directly from preoperative 3D point cloud data. Built upon a point cloud encoder-decoder architecture, the network integrates anatomical priors through a region-based feature alignment module. This enables precise modeling of structural changes while preserving critical anatomical features. In the second stage, we introduce a symmetry-constrained rigid alignment algorithm that automatically outputs the precise translation and rotation of each osteotomized bone segment required to match the predicted morphology. This ensures bilateral anatomical consistency and facilitates interpretable surgical plans. Compared with existing approaches, our method achieves superior quantitative performance and enhanced visualization results, as demonstrated by 65 experiments on real clinical datasets. Moreover, OrthoPlanner significantly reduces planning time and manual workload, while ensuring reproducible and clinically acceptable outcomes.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"54 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Subjective-objective Emotion Correlated Generation Network for Subjective Video Captioning 主观视频字幕的主客观情感关联生成网络
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-12 DOI: 10.1109/tip.2025.3649363
Weidong Chen, Cheng Ye, Peipei Song, Lei Zhang, Yongdong Zhang, Zhendong Mao
{"title":"Subjective-objective Emotion Correlated Generation Network for Subjective Video Captioning","authors":"Weidong Chen, Cheng Ye, Peipei Song, Lei Zhang, Yongdong Zhang, Zhendong Mao","doi":"10.1109/tip.2025.3649363","DOIUrl":"https://doi.org/10.1109/tip.2025.3649363","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"6 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145955213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Image Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1