Pub Date : 2025-02-05DOI: 10.1109/TCI.2025.3539021
Nikola Janjušević;Amirhossein Khalilian-Gourtani;Adeen Flinker;Li Feng;Yao Wang
Nonlocal self-similarity within images has become an increasingly popular prior in deep-learning models. Despite their successful image restoration performance, such models remain largely uninterpretable due to their black-box construction. Our previous studies have shown that interpretable construction of a fully convolutional denoiser (CDLNet), with performance on par with state-of-the-art black-box counterparts, is achievable by unrolling a convolutional dictionary learning algorithm. In this manuscript, we seek an interpretable construction of a convolutional network with a nonlocal self-similarity prior that performs on par with black-box nonlocal models. We show that such an architecture can be effectively achieved by upgrading the $ell _{1}$ sparsity prior (soft-thresholding) of CDLNet to an image-adaptive group-sparsity prior (group-thresholding). The proposed learned group-thresholding makes use of nonlocal attention to perform spatially varying soft-thresholding on the latent representation. To enable effective training and inference on large images with global artifacts, we propose a novel circulant-sparse attention. We achieve competitive natural-image denoising performance compared to black-box nonlocal DNNs and transformers. The interpretable construction of our network allows for a straightforward extension to Compressed Sensing MRI (CS-MRI), yielding state-of-the-art performance. Lastly, we show robustness to noise-level mismatches between training and inference for denoising and CS-MRI reconstruction.
{"title":"GroupCDL: Interpretable Denoising and Compressed Sensing MRI via Learned Group-Sparsity and Circulant Attention","authors":"Nikola Janjušević;Amirhossein Khalilian-Gourtani;Adeen Flinker;Li Feng;Yao Wang","doi":"10.1109/TCI.2025.3539021","DOIUrl":"https://doi.org/10.1109/TCI.2025.3539021","url":null,"abstract":"Nonlocal self-similarity within images has become an increasingly popular prior in deep-learning models. Despite their successful image restoration performance, such models remain largely uninterpretable due to their black-box construction. Our previous studies have shown that interpretable construction of a fully convolutional denoiser (CDLNet), with performance on par with state-of-the-art black-box counterparts, is achievable by unrolling a convolutional dictionary learning algorithm. In this manuscript, we seek an interpretable construction of a convolutional network with a nonlocal self-similarity prior that performs on par with black-box nonlocal models. We show that such an architecture can be effectively achieved by upgrading the <inline-formula><tex-math>$ell _{1}$</tex-math></inline-formula> sparsity prior (soft-thresholding) of CDLNet to an image-adaptive group-sparsity prior (group-thresholding). The proposed learned group-thresholding makes use of nonlocal attention to perform spatially varying soft-thresholding on the latent representation. To enable effective training and inference on large images with global artifacts, we propose a novel <italic>circulant-sparse attention</i>. We achieve competitive natural-image denoising performance compared to black-box nonlocal DNNs and transformers. The interpretable construction of our network allows for a straightforward extension to Compressed Sensing MRI (CS-MRI), yielding state-of-the-art performance. Lastly, we show robustness to noise-level mismatches between training and inference for denoising and CS-MRI reconstruction.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"11 ","pages":"201-212"},"PeriodicalIF":4.2,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-31DOI: 10.1109/TCI.2025.3536078
Jia Wu;Jinzhao Lin;Yu Pang;Xiaoming Jiang;Xinwei Li;Hongying Meng;Yamei Luo;Lu Yang;Zhangyong Li
Sparse-view computed tomography aims to reduce radiation exposure but often suffers from degraded image quality due to insufficient projection data. Traditional methods struggle to balance data fidelity and detail preservation, particularly in high-frequency regions. In this paper, we propose a Cascaded Frequency-Encoded Multi-Scale Neural Fields (Ca-FMNF) framework. We reformulate the reconstruction task as refining high-frequency residuals upon a high-quality low-frequency foundation. It integrates a pre-trained iterative unfolding network for initial low-frequency estimation with a FMNF to represent high-frequency residuals. The FMNF parameters are optimized by minimizing the discrepancy between the measured projections and those estimated through the imaging forward model, thereby refining the residuals based on the initial estimation. This dual-stage strategy enhances data consistency and preserves fine structures. The extensive experiments on simulated and clinical datasets demonstrate that our method achieves the optimal results in both quantitative metrics and visual quality, effectively reducing artifacts and preserving structural details.
{"title":"Cascaded Frequency-Encoded Multi-Scale Neural Fields for Sparse-View CT Reconstruction","authors":"Jia Wu;Jinzhao Lin;Yu Pang;Xiaoming Jiang;Xinwei Li;Hongying Meng;Yamei Luo;Lu Yang;Zhangyong Li","doi":"10.1109/TCI.2025.3536078","DOIUrl":"https://doi.org/10.1109/TCI.2025.3536078","url":null,"abstract":"Sparse-view computed tomography aims to reduce radiation exposure but often suffers from degraded image quality due to insufficient projection data. Traditional methods struggle to balance data fidelity and detail preservation, particularly in high-frequency regions. In this paper, we propose a Cascaded Frequency-Encoded Multi-Scale Neural Fields (Ca-FMNF) framework. We reformulate the reconstruction task as refining high-frequency residuals upon a high-quality low-frequency foundation. It integrates a pre-trained iterative unfolding network for initial low-frequency estimation with a FMNF to represent high-frequency residuals. The FMNF parameters are optimized by minimizing the discrepancy between the measured projections and those estimated through the imaging forward model, thereby refining the residuals based on the initial estimation. This dual-stage strategy enhances data consistency and preserves fine structures. The extensive experiments on simulated and clinical datasets demonstrate that our method achieves the optimal results in both quantitative metrics and visual quality, effectively reducing artifacts and preserving structural details.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"11 ","pages":"237-250"},"PeriodicalIF":4.2,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Point spread function (PSF) is quite important in modern computational microscopy techniques. Various approaches for measuring and modeling point spread functions have been proposed for both fluorescence and label-free microscopes. Among the various PSF candidates, it is often difficult to evaluate which PSF best suits the microscope and the experimental conditions. Visual qualification is often applied because there are hardly any techniques to quantify the quality of PSF as a basis for comparing different candidates and selecting the best one. To address this gap, we present a validation scheme based on the concept of confidence interval to evaluate the quality of fit of the PSF. This scheme is rigorous and supports precise validation for any microscope's PSF irrespective of their complexity, improving the performance of computational nanoscopy on them. We first demonstrate proof-of-principle of our scheme for a complex but practical label-free coherent imaging setup by comparing a variety of scalar and dyadic PSFs. Next, we validate our approach on conventional scalar PSFs using fluorescence based single molecule localization microscopy which needs PSF to compute the locations of single molecules. Lastly, we demonstrate how the scheme can be used in practice for challenging scenarios using images of gold nanorods placed on and illuminated by a photonic chip waveguide imaged using a label-free dark-field microscopy setup. Through these experiments, we demonstrate the generality and versatility of our PSF validation approach for the microscopy domain.
{"title":"Computational Comparison and Validation of Point Spread Functions for Optical Microscopes","authors":"Zicheng Liu;Yingying Qin;Jean-Claude Tinguely;Krishna Agarwal","doi":"10.1109/TCI.2025.3536106","DOIUrl":"https://doi.org/10.1109/TCI.2025.3536106","url":null,"abstract":"Point spread function (PSF) is quite important in modern computational microscopy techniques. Various approaches for measuring and modeling point spread functions have been proposed for both fluorescence and label-free microscopes. Among the various PSF candidates, it is often difficult to evaluate which PSF best suits the microscope and the experimental conditions. Visual qualification is often applied because there are hardly any techniques to quantify the quality of PSF as a basis for comparing different candidates and selecting the best one. To address this gap, we present a validation scheme based on the concept of confidence interval to evaluate the quality of fit of the PSF. This scheme is rigorous and supports precise validation for any microscope's PSF irrespective of their complexity, improving the performance of computational nanoscopy on them. We first demonstrate proof-of-principle of our scheme for a complex but practical label-free coherent imaging setup by comparing a variety of scalar and dyadic PSFs. Next, we validate our approach on conventional scalar PSFs using fluorescence based single molecule localization microscopy which needs PSF to compute the locations of single molecules. Lastly, we demonstrate how the scheme can be used in practice for challenging scenarios using images of gold nanorods placed on and illuminated by a photonic chip waveguide imaged using a label-free dark-field microscopy setup. Through these experiments, we demonstrate the generality and versatility of our PSF validation approach for the microscopy domain.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"11 ","pages":"170-178"},"PeriodicalIF":4.2,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10857452","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143430536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-29DOI: 10.1109/TCI.2025.3536092
María Peña;Diego Gutierrez;Julio Marco
Time-gated non-line-of-sight (NLOS) imaging methods reconstruct scenes hidden around a corner by inverting the optical path of indirect photons measured at visible surfaces. These methods are, however, hindered by intricate, time-consuming calibration processes involving expensive capture hardware. Simulation of transient light transport in synthetic 3D scenes has become a powerful but computationally-intensive alternative for analysis and benchmarking of NLOS imaging methods. NLOS imaging methods also suffer from high computational complexity. In our work, we rely on dimensionality reduction to provide a real-time simulation framework for NLOS imaging performance analysis. We extend steady-state light transport in self-contained 2D worlds to take into account the propagation of time-resolved illumination by reformulating the transient path integral in 2D. We couple it with the recent phasor-field formulation of NLOS imaging to provide an end-to-end simulation and imaging pipeline that incorporates different NLOS imaging camera models. Our pipeline yields real-time NLOS images and progressive refinement of light transport simulations. We allow comprehensive control on a wide set of scene, rendering, and NLOS imaging parameters, providing effective real-time analysis of their impact on reconstruction quality. We illustrate the effectiveness of our pipeline by validating 2D counterparts of existing 3D NLOS imaging experiments, and provide an extensive analysis of imaging performance including a wider set of NLOS imaging conditions, such as filtering, reflectance, and geometric features in NLOS imaging setups.
{"title":"Looking Around Flatland: End-to-End 2D Real-Time NLOS Imaging","authors":"María Peña;Diego Gutierrez;Julio Marco","doi":"10.1109/TCI.2025.3536092","DOIUrl":"https://doi.org/10.1109/TCI.2025.3536092","url":null,"abstract":"Time-gated non-line-of-sight (NLOS) imaging methods reconstruct scenes hidden around a corner by inverting the optical path of indirect photons measured at visible surfaces. These methods are, however, hindered by intricate, time-consuming calibration processes involving expensive capture hardware. Simulation of transient light transport in synthetic 3D scenes has become a powerful but computationally-intensive alternative for analysis and benchmarking of NLOS imaging methods. NLOS imaging methods also suffer from high computational complexity. In our work, we rely on dimensionality reduction to provide a real-time simulation framework for NLOS imaging performance analysis. We extend steady-state light transport in self-contained 2D worlds to take into account the propagation of time-resolved illumination by reformulating the transient path integral in 2D. We couple it with the recent phasor-field formulation of NLOS imaging to provide an end-to-end simulation and imaging pipeline that incorporates different NLOS imaging camera models. Our pipeline yields real-time NLOS images and progressive refinement of light transport simulations. We allow comprehensive control on a wide set of scene, rendering, and NLOS imaging parameters, providing effective real-time analysis of their impact on reconstruction quality. We illustrate the effectiveness of our pipeline by validating 2D counterparts of existing 3D NLOS imaging experiments, and provide an extensive analysis of imaging performance including a wider set of NLOS imaging conditions, such as filtering, reflectance, and geometric features in NLOS imaging setups.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"11 ","pages":"189-200"},"PeriodicalIF":4.2,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10857386","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143430534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Space-time video super-resolution aims to reconstruct the high-frame-rate and high-resolution video from the corresponding low-frame-rate and low-resolution counterpart. Currently, the task faces the challenge of efficiently extracting long-range temporal information from available frames. Meanwhile, existing methods can only produce results for a specific moment and cannot interpolate high-resolution frames for consecutive time stamps. To address these issues, we propose a multi-stage feature enhancement method that better utilizes the limited spatio-temporal information subject to the efficiency constraint. Our approach involves a pre-alignment module that extracts coarse aligned features from the adjacent odd-numbered frames in the first stage. In the second stage, we use a bidirectional recurrent module to refine the aligned features by exploiting the long-range information from all input frames while simultaneously performing video frame interpolation. The proposed video frame interpolation module concatenates temporal information with spatial features to achieve continuous interpolation, which refines the interpolated feature progressively and enhances the spatial information by utilizing the features of different scales. Extensive experiments on various benchmarks demonstrate that the proposed method outperforms state-of-the-art in both quantitative metrics and visual effects.
{"title":"Dual Bidirectional Feature Enhancement Network for Continuous Space-Time Video Super-Resolution","authors":"Laigan Luo;Benshun Yi;Zhongyuan Wang;Zheng He;Chao Zhu","doi":"10.1109/TCI.2025.3531717","DOIUrl":"https://doi.org/10.1109/TCI.2025.3531717","url":null,"abstract":"Space-time video super-resolution aims to reconstruct the high-frame-rate and high-resolution video from the corresponding low-frame-rate and low-resolution counterpart. Currently, the task faces the challenge of efficiently extracting long-range temporal information from available frames. Meanwhile, existing methods can only produce results for a specific moment and cannot interpolate high-resolution frames for consecutive time stamps. To address these issues, we propose a multi-stage feature enhancement method that better utilizes the limited spatio-temporal information subject to the efficiency constraint. Our approach involves a pre-alignment module that extracts coarse aligned features from the adjacent odd-numbered frames in the first stage. In the second stage, we use a bidirectional recurrent module to refine the aligned features by exploiting the long-range information from all input frames while simultaneously performing video frame interpolation. The proposed video frame interpolation module concatenates temporal information with spatial features to achieve continuous interpolation, which refines the interpolated feature progressively and enhances the spatial information by utilizing the features of different scales. Extensive experiments on various benchmarks demonstrate that the proposed method outperforms state-of-the-art in both quantitative metrics and visual effects.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"11 ","pages":"228-236"},"PeriodicalIF":4.2,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143535473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dual-energy computed tomography (DECT) offers quantitative insights and facilitates material decomposition, aiding in precise diagnosis and treatment planning. However, existing methods for material decomposition, often tailored to specific material types, need more generalizability and increase computational load with each additional material. We propose a CLIP-Driven Universal Model for adaptive Multi-Material Decomposition (MMD) to tackle this challenge. This model utilizes the semantic capabilities of text embeddings from Contrastive Language-Image Pre-training (CLIP), allowing a single network to manage structured feature embedding for multiple materials. A novel Siamese encoder and differential map fusion technique have also been integrated to enhance the decomposition accuracy while maintaining robustness across various conditions. Experiments on the simulated and physical patient studies have evidenced our model's superiority over traditional methods. Notably, it has significantly improved the Dice Similarity Coefficient—4.1%. These results underscore the potential of our network in clinical MMD applications, suggesting a promising avenue for enhancing DECT imaging analysis.
{"title":"Clip-Driven Universal Model for Multi-Material Decomposition in Dual-Energy CT","authors":"Xianghong Wang;Jiajun Xiang;Aihua Mao;Jiayi Xie;Peng Jin;Mingchao Ding;Yixuan Yuan;Yanye Lu;Lequan Yu;Hongmin Cai;Baiying Lei;Tianye Niu","doi":"10.1109/TCI.2025.3531707","DOIUrl":"https://doi.org/10.1109/TCI.2025.3531707","url":null,"abstract":"Dual-energy computed tomography (DECT) offers quantitative insights and facilitates material decomposition, aiding in precise diagnosis and treatment planning. However, existing methods for material decomposition, often tailored to specific material types, need more generalizability and increase computational load with each additional material. We propose a CLIP-Driven Universal Model for adaptive Multi-Material Decomposition (MMD) to tackle this challenge. This model utilizes the semantic capabilities of text embeddings from Contrastive Language-Image Pre-training (CLIP), allowing a single network to manage structured feature embedding for multiple materials. A novel Siamese encoder and differential map fusion technique have also been integrated to enhance the decomposition accuracy while maintaining robustness across various conditions. Experiments on the simulated and physical patient studies have evidenced our model's superiority over traditional methods. Notably, it has significantly improved the Dice Similarity Coefficient—4.1%. These results underscore the potential of our network in clinical MMD applications, suggesting a promising avenue for enhancing DECT imaging analysis.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"11 ","pages":"349-361"},"PeriodicalIF":4.2,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143716398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-17DOI: 10.1109/TCI.2025.3531729
Tingting Wu;Simiao Liu;Hao Zhang;Tieyong Zeng
In recent years, plug-and-play (PnP) approaches have emerged as an appealing strategy for recovering magnetic resonance imaging. Compared with traditional compressed sensing methods, these approaches can leverage innovative denoisers to exploit the richer structure of medical images. However, most state-of-the-art networks are not able to adaptively remove noise at each level. To solve this problem, we propose a joint denoising network based on PnP trained to evaluate the noise distribution, realizing efficient, flexible, and accurate reconstruction. The ability of the first subnetwork to estimate complex distributions is utilized to implicitly learn noisy features, effectively tackling the difficulty of precisely delineating the obscure noise law. The second subnetwork builds on the first network and can denoise and reconstruct the image after obtaining the noise distribution. Precisely, the hyperparameter is dynamically adjusted to regulate the denoising level throughout each iteration, ensuring the convergence of our model. This step can gradually remove the image noise and use previous knowledge extracted from the frequency domain to enhance spatial particulars simultaneously. The experimental results significantly improve quantitative metrics and visual performance on different datasets.
{"title":"Estimation-Denoising Integration Network Architecture With Updated Parameter for MRI Reconstruction","authors":"Tingting Wu;Simiao Liu;Hao Zhang;Tieyong Zeng","doi":"10.1109/TCI.2025.3531729","DOIUrl":"https://doi.org/10.1109/TCI.2025.3531729","url":null,"abstract":"In recent years, plug-and-play (PnP) approaches have emerged as an appealing strategy for recovering magnetic resonance imaging. Compared with traditional compressed sensing methods, these approaches can leverage innovative denoisers to exploit the richer structure of medical images. However, most state-of-the-art networks are not able to adaptively remove noise at each level. To solve this problem, we propose a joint denoising network based on PnP trained to evaluate the noise distribution, realizing efficient, flexible, and accurate reconstruction. The ability of the first subnetwork to estimate complex distributions is utilized to implicitly learn noisy features, effectively tackling the difficulty of precisely delineating the obscure noise law. The second subnetwork builds on the first network and can denoise and reconstruct the image after obtaining the noise distribution. Precisely, the hyperparameter is dynamically adjusted to regulate the denoising level throughout each iteration, ensuring the convergence of our model. This step can gradually remove the image noise and use previous knowledge extracted from the frequency domain to enhance spatial particulars simultaneously. The experimental results significantly improve quantitative metrics and visual performance on different datasets.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"11 ","pages":"142-153"},"PeriodicalIF":4.2,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143105746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-10DOI: 10.1109/TCI.2025.3525960
Ruizhi Hou;Fang Li
Though fully-supervised deep learning methods have made remarkable achievements in accelerated magnetic resonance imaging (MRI) reconstruction, the fully-sampled or high-quality data is unavailable in many scenarios. Zero-shot learning enables training on under-sampled data. However, the limited information in under-sampled data inhibits the neural network from realizing its full potential. This paper proposes a novel learning framework to enhance the diversity of the learned prior in zero-shot learning and improve the reconstruction quality. It consists of three stages: multi-weighted zero-shot ensemble learning, denoising knowledge transfer, and model-guided reconstruction. In the first stage, the ensemble models are trained using a multi-weighted loss function in k-space, yielding results with higher quality and diversity. In the second stage, we propose to use the deep denoiser to distill the knowledge in the ensemble models. Additionally, the denoiser is initialized using weights pre-trained on nature images, combining external knowledge with the information from under-sampled data. In the third stage, the denoiser is plugged into the iteration algorithm to produce the final reconstructed image. Extensive experiments demonstrate that our proposed framework surpasses existing zero-shot methods and can flexibly adapt to different datasets. In multi-coil reconstruction, our proposed zero-shot learning framework outperforms the state-of-the-art denoising-based methods.
{"title":"Denoising Knowledge Transfer Model for Zero-Shot MRI Reconstruction","authors":"Ruizhi Hou;Fang Li","doi":"10.1109/TCI.2025.3525960","DOIUrl":"https://doi.org/10.1109/TCI.2025.3525960","url":null,"abstract":"Though fully-supervised deep learning methods have made remarkable achievements in accelerated magnetic resonance imaging (MRI) reconstruction, the fully-sampled or high-quality data is unavailable in many scenarios. Zero-shot learning enables training on under-sampled data. However, the limited information in under-sampled data inhibits the neural network from realizing its full potential. This paper proposes a novel learning framework to enhance the diversity of the learned prior in zero-shot learning and improve the reconstruction quality. It consists of three stages: multi-weighted zero-shot ensemble learning, denoising knowledge transfer, and model-guided reconstruction. In the first stage, the ensemble models are trained using a multi-weighted loss function in k-space, yielding results with higher quality and diversity. In the second stage, we propose to use the deep denoiser to distill the knowledge in the ensemble models. Additionally, the denoiser is initialized using weights pre-trained on nature images, combining external knowledge with the information from under-sampled data. In the third stage, the denoiser is plugged into the iteration algorithm to produce the final reconstructed image. Extensive experiments demonstrate that our proposed framework surpasses existing zero-shot methods and can flexibly adapt to different datasets. In multi-coil reconstruction, our proposed zero-shot learning framework outperforms the state-of-the-art denoising-based methods.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"11 ","pages":"52-64"},"PeriodicalIF":4.2,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As a neuromorphic vision sensor with ultra-high temporal resolution, spike camera shows great potential in high-speed imaging. To capture color information of dynamic scenes, color spike camera (CSC) has been invented with a Bayer-pattern color filter array (CFA) on the sensor. Some spike camera reconstruction methods try to train end-to-end models by massive synthetic data pairs. However, there are gaps between synthetic and real-world captured data. The distribution of training data impacts model generalizability. In this paper, we propose a zero-shot learning-based method for CSC reconstruction to restore color images from a Bayer-pattern spike stream without pre-training. As the Bayer-pattern spike stream consists of binary signal arrays with missing pixels, we propose to leverage temporally neighboring spike signals of frame, pixel and interval levels to restore color channels. In particular, we employ a zero-shot learning-based scheme to iteratively refine the output via temporally neighboring spike stream clips. To generate high-quality pseudo-labels, we propose to exploit temporally neighboring pixels along the motion direction to estimate the missing pixels. Besides, a temporally neighboring spike interval-based representation is developed to extract temporal and color features from the binary Bayer-pattern spike stream. Experimental results on real-world captured data demonstrate that our method can restore color images with better visual quality than compared methods.
{"title":"Dynamic Scene Reconstruction for Color Spike Camera via Zero-Shot Learning","authors":"Yanchen Dong;Ruiqin Xiong;Xiaopeng Fan;Shuyuan Zhu;Jin Wang;Tiejun Huang","doi":"10.1109/TCI.2025.3527156","DOIUrl":"https://doi.org/10.1109/TCI.2025.3527156","url":null,"abstract":"As a neuromorphic vision sensor with ultra-high temporal resolution, spike camera shows great potential in high-speed imaging. To capture color information of dynamic scenes, color spike camera (CSC) has been invented with a Bayer-pattern color filter array (CFA) on the sensor. Some spike camera reconstruction methods try to train end-to-end models by massive synthetic data pairs. However, there are gaps between synthetic and real-world captured data. The distribution of training data impacts model generalizability. In this paper, we propose a zero-shot learning-based method for CSC reconstruction to restore color images from a Bayer-pattern spike stream without pre-training. As the Bayer-pattern spike stream consists of binary signal arrays with missing pixels, we propose to leverage temporally neighboring spike signals of frame, pixel and interval levels to restore color channels. In particular, we employ a zero-shot learning-based scheme to iteratively refine the output via temporally neighboring spike stream clips. To generate high-quality pseudo-labels, we propose to exploit temporally neighboring pixels along the motion direction to estimate the missing pixels. Besides, a temporally neighboring spike interval-based representation is developed to extract temporal and color features from the binary Bayer-pattern spike stream. Experimental results on real-world captured data demonstrate that our method can restore color images with better visual quality than compared methods.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"11 ","pages":"129-141"},"PeriodicalIF":4.2,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143105749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-09DOI: 10.1109/TCI.2025.3527880
Tiancheng Li;Qiurong Yan;Yi Li;Jinwei Yan
Deep Unfolding Network (DUN) has achieved great success in the image Compressed Sensing (CS) field benefiting from its great interpretability and performance. However, existing DUNs suffer from limited information transmission capacity with increasingly complex structures, leading to undesirable results. Besides, current DUNs are mostly established based on one specific optimization algorithm, which hampers the development and understanding of DUN. In this paper, we propose a new unfolding formula combining the Approximate Message Passing algorithm (AMP) and Range-Nullspace Decomposition (RND), which offers new insights for DUN design. To maximize information transmission and utilization, we propose a novel High-Throughput Decomposition-Inspired Deep Unfolding Network (HTDIDUN) based on the new formula. Specifically, we design a powerful Nullspace Information Extractor (NIE) with high-throughput transmission and stacked residual channel attention blocks. By modulating the dimension of the feature space, we provide three implementations from small to large. Extensive experiments on natural and medical images manifest that our HTDIDUN family members outperform other state-of-the-art methods by a large margin. Our codes and pre-trained models are available on GitHub to facilitate further exploration.
{"title":"High-Throughput Decomposition-Inspired Deep Unfolding Network for Image Compressed Sensing","authors":"Tiancheng Li;Qiurong Yan;Yi Li;Jinwei Yan","doi":"10.1109/TCI.2025.3527880","DOIUrl":"https://doi.org/10.1109/TCI.2025.3527880","url":null,"abstract":"Deep Unfolding Network (DUN) has achieved great success in the image Compressed Sensing (CS) field benefiting from its great interpretability and performance. However, existing DUNs suffer from limited information transmission capacity with increasingly complex structures, leading to undesirable results. Besides, current DUNs are mostly established based on one specific optimization algorithm, which hampers the development and understanding of DUN. In this paper, we propose a new unfolding formula combining the Approximate Message Passing algorithm (AMP) and Range-Nullspace Decomposition (RND), which offers new insights for DUN design. To maximize information transmission and utilization, we propose a novel High-Throughput Decomposition-Inspired Deep Unfolding Network (HTDIDUN) based on the new formula. Specifically, we design a powerful Nullspace Information Extractor (NIE) with high-throughput transmission and stacked residual channel attention blocks. By modulating the dimension of the feature space, we provide three implementations from small to large. Extensive experiments on natural and medical images manifest that our HTDIDUN family members outperform other state-of-the-art methods by a large margin. Our codes and pre-trained models are available on GitHub to facilitate further exploration.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"11 ","pages":"89-100"},"PeriodicalIF":4.2,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143105748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}