Pub Date : 2024-07-11DOI: 10.1109/TMI.2024.3426953
Hongqiu Wang, Guang Yang, Shichen Zhang, Jing Qin, Yike Guo, Bo Xu, Yueming Jin, Lei Zhu
Surgical instrument segmentation is fundamentally important for facilitating cognitive intelligence in robot-assisted surgery. Although existing methods have achieved accurate instrument segmentation results, they simultaneously generate segmentation masks of all instruments, which lack the capability to specify a target object and allow an interactive experience. This paper focuses on a novel and essential task in robotic surgery, i.e., Referring Surgical Video Instrument Segmentation (RSVIS), which aims to automatically identify and segment the target surgical instruments from each video frame, referred by a given language expression. This interactive feature offers enhanced user engagement and customized experiences, greatly benefiting the development of the next generation of surgical education systems. To achieve this, this paper constructs two surgery video datasets to promote the RSVIS research. Then, we devise a novel Video-Instrument Synergistic Network (VIS-Net) to learn both video-level and instrument-level knowledge to boost performance, while previous work only utilized video-level information. Meanwhile, we design a Graph-based Relation-aware Module (GRM) to model the correlation between multi-modal information (i.e., textual description and video frame) to facilitate the extraction of instrument-level information. Extensive experimental results on two RSVIS datasets exhibit that the VIS-Net can significantly outperform existing state-of-the-art referring segmentation methods. We will release our code and dataset for future research (Git).
{"title":"Video-Instrument Synergistic Network for Referring Video Instrument Segmentation in Robotic Surgery.","authors":"Hongqiu Wang, Guang Yang, Shichen Zhang, Jing Qin, Yike Guo, Bo Xu, Yueming Jin, Lei Zhu","doi":"10.1109/TMI.2024.3426953","DOIUrl":"https://doi.org/10.1109/TMI.2024.3426953","url":null,"abstract":"<p><p>Surgical instrument segmentation is fundamentally important for facilitating cognitive intelligence in robot-assisted surgery. Although existing methods have achieved accurate instrument segmentation results, they simultaneously generate segmentation masks of all instruments, which lack the capability to specify a target object and allow an interactive experience. This paper focuses on a novel and essential task in robotic surgery, i.e., Referring Surgical Video Instrument Segmentation (RSVIS), which aims to automatically identify and segment the target surgical instruments from each video frame, referred by a given language expression. This interactive feature offers enhanced user engagement and customized experiences, greatly benefiting the development of the next generation of surgical education systems. To achieve this, this paper constructs two surgery video datasets to promote the RSVIS research. Then, we devise a novel Video-Instrument Synergistic Network (VIS-Net) to learn both video-level and instrument-level knowledge to boost performance, while previous work only utilized video-level information. Meanwhile, we design a Graph-based Relation-aware Module (GRM) to model the correlation between multi-modal information (i.e., textual description and video frame) to facilitate the extraction of instrument-level information. Extensive experimental results on two RSVIS datasets exhibit that the VIS-Net can significantly outperform existing state-of-the-art referring segmentation methods. We will release our code and dataset for future research (Git).</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141592375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-09DOI: 10.1109/TMI.2024.3425533
Linqin Cai, Haodu Fang, Nuoying Xu, Bo Ren
Medical Visual Question Answering (VQA-Med) is a challenging task that involves answering clinical questions related to medical images. However, most current VQA-Med methods ignore the causal correlation between specific lesion or abnormality features and answers, while also failing to provide accurate explanations for their decisions. To explore the interpretability of VQA-Med, this paper proposes a novel CCIS-MVQA model for VQA-Med based on a counterfactual causal-effect intervention strategy. This model consists of the modified ResNet for image feature extraction, a GloVe decoder for question feature extraction, a bilinear attention network for vision and language feature fusion, and an interpretability generator for producing the interpretability and prediction results. The proposed CCIS-MVQA introduces a layer-wise relevance propagation method to automatically generate counterfactual samples. Additionally, CCIS-MVQA applies counterfactual causal reasoning throughout the training phase to enhance interpretability and generalization. Extensive experiments on three benchmark datasets show that the proposed CCIS-MVQA model outperforms the state-of-the-art methods. Enough visualization results are produced to analyze the interpretability and performance of CCIS-MVQA.
{"title":"Counterfactual Causal-Effect Intervention for Interpretable Medical Visual Question Answering.","authors":"Linqin Cai, Haodu Fang, Nuoying Xu, Bo Ren","doi":"10.1109/TMI.2024.3425533","DOIUrl":"https://doi.org/10.1109/TMI.2024.3425533","url":null,"abstract":"<p><p>Medical Visual Question Answering (VQA-Med) is a challenging task that involves answering clinical questions related to medical images. However, most current VQA-Med methods ignore the causal correlation between specific lesion or abnormality features and answers, while also failing to provide accurate explanations for their decisions. To explore the interpretability of VQA-Med, this paper proposes a novel CCIS-MVQA model for VQA-Med based on a counterfactual causal-effect intervention strategy. This model consists of the modified ResNet for image feature extraction, a GloVe decoder for question feature extraction, a bilinear attention network for vision and language feature fusion, and an interpretability generator for producing the interpretability and prediction results. The proposed CCIS-MVQA introduces a layer-wise relevance propagation method to automatically generate counterfactual samples. Additionally, CCIS-MVQA applies counterfactual causal reasoning throughout the training phase to enhance interpretability and generalization. Extensive experiments on three benchmark datasets show that the proposed CCIS-MVQA model outperforms the state-of-the-art methods. Enough visualization results are produced to analyze the interpretability and performance of CCIS-MVQA.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-08DOI: 10.1109/TMI.2024.3424505
Ke Zhang, Yan Yang, Jun Yu, Jianping Fan, Hanliang Jiang, Qingming Huang, Weidong Han
The potential of automated radiology report generation in alleviating the time-consuming tasks of radiologists is increasingly being recognized in medical practice. Existing report generation methods have evolved from using image-level features to the latest approach of utilizing anatomical regions, significantly enhancing interpretability. However, directly and simplistically using region features for report generation compromises the capability of relation reasoning and overlooks the common attributes potentially shared across regions. To address these limitations, we propose a novel region-based Attribute Prototype-guided Iterative Scene Graph generation framework (AP-ISG) for report generation, utilizing scene graph generation as an auxiliary task to further enhance interpretability and relational reasoning capability. The core components of AP-ISG are the Iterative Scene Graph Generation (ISGG) module and the Attribute Prototype-guided Learning (APL) module. Specifically, ISSG employs an autoregressive scheme for structural edge reasoning and a contextualization mechanism for relational reasoning. APL enhances intra-prototype matching and reduces inter-prototype semantic overlap in the visual space to fully model the potential attribute commonalities among regions. Extensive experiments on the MIMIC-CXR with Chest ImaGenome datasets demonstrate the superiority of AP-ISG across multiple metrics.
{"title":"Attribute Prototype-guided Iterative Scene Graph for Explainable Radiology Report Generation.","authors":"Ke Zhang, Yan Yang, Jun Yu, Jianping Fan, Hanliang Jiang, Qingming Huang, Weidong Han","doi":"10.1109/TMI.2024.3424505","DOIUrl":"https://doi.org/10.1109/TMI.2024.3424505","url":null,"abstract":"<p><p>The potential of automated radiology report generation in alleviating the time-consuming tasks of radiologists is increasingly being recognized in medical practice. Existing report generation methods have evolved from using image-level features to the latest approach of utilizing anatomical regions, significantly enhancing interpretability. However, directly and simplistically using region features for report generation compromises the capability of relation reasoning and overlooks the common attributes potentially shared across regions. To address these limitations, we propose a novel region-based Attribute Prototype-guided Iterative Scene Graph generation framework (AP-ISG) for report generation, utilizing scene graph generation as an auxiliary task to further enhance interpretability and relational reasoning capability. The core components of AP-ISG are the Iterative Scene Graph Generation (ISGG) module and the Attribute Prototype-guided Learning (APL) module. Specifically, ISSG employs an autoregressive scheme for structural edge reasoning and a contextualization mechanism for relational reasoning. APL enhances intra-prototype matching and reduces inter-prototype semantic overlap in the visual space to fully model the potential attribute commonalities among regions. Extensive experiments on the MIMIC-CXR with Chest ImaGenome datasets demonstrate the superiority of AP-ISG across multiple metrics.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141560572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Medical image analysis poses significant challenges due to limited availability of clinical data, which is crucial for training accurate models. This limitation is further compounded by the specialized and labor-intensive nature of the data annotation process. For example, despite the popularity of computed tomography angiography (CTA) in diagnosing atherosclerosis with an abundance of annotated datasets, magnetic resonance (MR) images stand out with better visualization for soft plaque and vessel wall characterization. However, the higher cost and limited accessibility of MR, as well as time-consuming nature of manual labeling, contribute to fewer annotated datasets. To address these issues, we formulate a multi-modal transfer learning network, named MT-Net, designed to learn from unpaired CTA and sparsely-annotated MR data. Additionally, we harness the Segment Anything Model (SAM) to synthesize additional MR annotations, enriching the training process. Specifically, our method first segments vessel lumen regions followed by precise characterization of carotid artery vessel walls, thereby ensuring both segmentation accuracy and clinical relevance. Validation of our method involved rigorous experimentation on publicly available datasets from COSMOS and CARE-II challenge, demonstrating its superior performance compared to existing state-of-the-art techniques.
{"title":"Carotid Vessel Wall Segmentation Through Domain Aligner, Topological Learning, and Segment Anything Model for Sparse Annotation in MR Images.","authors":"Xibao Li, Xi Ouyang, Jiadong Zhang, Zhongxiang Ding, Yuyao Zhang, Zhong Xue, Feng Shi, Dinggang Shen","doi":"10.1109/TMI.2024.3424884","DOIUrl":"https://doi.org/10.1109/TMI.2024.3424884","url":null,"abstract":"<p><p>Medical image analysis poses significant challenges due to limited availability of clinical data, which is crucial for training accurate models. This limitation is further compounded by the specialized and labor-intensive nature of the data annotation process. For example, despite the popularity of computed tomography angiography (CTA) in diagnosing atherosclerosis with an abundance of annotated datasets, magnetic resonance (MR) images stand out with better visualization for soft plaque and vessel wall characterization. However, the higher cost and limited accessibility of MR, as well as time-consuming nature of manual labeling, contribute to fewer annotated datasets. To address these issues, we formulate a multi-modal transfer learning network, named MT-Net, designed to learn from unpaired CTA and sparsely-annotated MR data. Additionally, we harness the Segment Anything Model (SAM) to synthesize additional MR annotations, enriching the training process. Specifically, our method first segments vessel lumen regions followed by precise characterization of carotid artery vessel walls, thereby ensuring both segmentation accuracy and clinical relevance. Validation of our method involved rigorous experimentation on publicly available datasets from COSMOS and CARE-II challenge, demonstrating its superior performance compared to existing state-of-the-art techniques.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141560573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-08DOI: 10.1109/TMI.2024.3424785
Yue Zhang, Chengtao Peng, Qiuli Wang, Dan Song, Kaiyan Li, S Kevin Zhou
Multi-modal medical images provide complementary soft-tissue characteristics that aid in the screening and diagnosis of diseases. However, limited scanning time, image corruption and various imaging protocols often result in incomplete multi-modal images, thus limiting the usage of multi-modal data for clinical purposes. To address this issue, in this paper, we propose a novel unified multi-modal image synthesis method for missing modality imputation. Our method overall takes a generative adversarial architecture, which aims to synthesize missing modalities from any combination of available ones with a single model. To this end, we specifically design a Commonality- and Discrepancy-Sensitive Encoder for the generator to exploit both modality-invariant and specific information contained in input modalities. The incorporation of both types of information facilitates the generation of images with consistent anatomy and realistic details of the desired distribution. Besides, we propose a Dynamic Feature Unification Module to integrate information from a varying number of available modalities, which enables the network to be robust to random missing modalities. The module performs both hard integration and soft integration, ensuring the effectiveness of feature combination while avoiding information loss. Verified on two public multi-modal magnetic resonance datasets, the proposed method is effective in handling various synthesis tasks and shows superior performance compared to previous methods.
{"title":"Unified Multi-Modal Image Synthesis for Missing Modality Imputation.","authors":"Yue Zhang, Chengtao Peng, Qiuli Wang, Dan Song, Kaiyan Li, S Kevin Zhou","doi":"10.1109/TMI.2024.3424785","DOIUrl":"https://doi.org/10.1109/TMI.2024.3424785","url":null,"abstract":"<p><p>Multi-modal medical images provide complementary soft-tissue characteristics that aid in the screening and diagnosis of diseases. However, limited scanning time, image corruption and various imaging protocols often result in incomplete multi-modal images, thus limiting the usage of multi-modal data for clinical purposes. To address this issue, in this paper, we propose a novel unified multi-modal image synthesis method for missing modality imputation. Our method overall takes a generative adversarial architecture, which aims to synthesize missing modalities from any combination of available ones with a single model. To this end, we specifically design a Commonality- and Discrepancy-Sensitive Encoder for the generator to exploit both modality-invariant and specific information contained in input modalities. The incorporation of both types of information facilitates the generation of images with consistent anatomy and realistic details of the desired distribution. Besides, we propose a Dynamic Feature Unification Module to integrate information from a varying number of available modalities, which enables the network to be robust to random missing modalities. The module performs both hard integration and soft integration, ensuring the effectiveness of feature combination while avoiding information loss. Verified on two public multi-modal magnetic resonance datasets, the proposed method is effective in handling various synthesis tasks and shows superior performance compared to previous methods.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141560601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-08DOI: 10.1109/TMI.2024.3424471
Tao Chen, Chenhui Wang, Zhihao Chen, Yiming Lei, Hongming Shan
Medical image segmentation has been significantly advanced with the rapid development of deep learning (DL) techniques. Existing DL-based segmentation models are typically discriminative; i.e., they aim to learn a mapping from the input image to segmentation masks. However, these discriminative methods neglect the underlying data distribution and intrinsic class characteristics, suffering from unstable feature space. In this work, we propose to complement discriminative segmentation methods with the knowledge of underlying data distribution from generative models. To that end, we propose a novel hybrid diffusion framework for medical image segmentation, termed HiDiff, which can synergize the strengths of existing discriminative segmentation models and new generative diffusion models. HiDiff comprises two key components: discriminative segmentor and diffusion refiner. First, we utilize any conventional trained segmentation models as discriminative segmentor, which can provide a segmentation mask prior for diffusion refiner. Second, we propose a novel binary Bernoulli diffusion model (BBDM) as the diffusion refiner, which can effectively, efficiently, and interactively refine the segmentation mask by modeling the underlying data distribution. Third, we train the segmentor and BBDM in an alternate-collaborative manner to mutually boost each other. Extensive experimental results on abdomen organ, brain tumor, polyps, and retinal vessels segmentation datasets, covering four widely-used modalities, demonstrate the superior performance of HiDiff over existing medical segmentation algorithms, including the state-of-the-art transformer- and diffusion-based ones. In addition, HiDiff excels at segmenting small objects and generalizing to new datasets. Source codes are made available at https://github.com/takimailto/HiDiff.
{"title":"HiDiff: Hybrid Diffusion Framework for Medical Image Segmentation.","authors":"Tao Chen, Chenhui Wang, Zhihao Chen, Yiming Lei, Hongming Shan","doi":"10.1109/TMI.2024.3424471","DOIUrl":"https://doi.org/10.1109/TMI.2024.3424471","url":null,"abstract":"<p><p>Medical image segmentation has been significantly advanced with the rapid development of deep learning (DL) techniques. Existing DL-based segmentation models are typically discriminative; i.e., they aim to learn a mapping from the input image to segmentation masks. However, these discriminative methods neglect the underlying data distribution and intrinsic class characteristics, suffering from unstable feature space. In this work, we propose to complement discriminative segmentation methods with the knowledge of underlying data distribution from generative models. To that end, we propose a novel hybrid diffusion framework for medical image segmentation, termed HiDiff, which can synergize the strengths of existing discriminative segmentation models and new generative diffusion models. HiDiff comprises two key components: discriminative segmentor and diffusion refiner. First, we utilize any conventional trained segmentation models as discriminative segmentor, which can provide a segmentation mask prior for diffusion refiner. Second, we propose a novel binary Bernoulli diffusion model (BBDM) as the diffusion refiner, which can effectively, efficiently, and interactively refine the segmentation mask by modeling the underlying data distribution. Third, we train the segmentor and BBDM in an alternate-collaborative manner to mutually boost each other. Extensive experimental results on abdomen organ, brain tumor, polyps, and retinal vessels segmentation datasets, covering four widely-used modalities, demonstrate the superior performance of HiDiff over existing medical segmentation algorithms, including the state-of-the-art transformer- and diffusion-based ones. In addition, HiDiff excels at segmenting small objects and generalizing to new datasets. Source codes are made available at https://github.com/takimailto/HiDiff.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141560600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-05DOI: 10.1109/TMI.2024.3420228
Zekun Li, Nadia Benabdallah, Richard Laforest, Richard L Wahl, Daniel L J Thorek, Abhinav K Jha
Thorium-227 (227Th)-based α-particle radiopharmaceutical therapies (α-RPTs) are currently being investigated in several clinical and pre-clinical studies. After administration, 227Th decays to 223Ra, another α-particle-emitting isotope, which redistributes within the patient. Reliable dose quantification of both 227Th and 223Ra is clinically important, and SPECT may perform this quantification as these isotopes also emit X- and γ-ray photons. However, reliable quantification is challenging for several reasons: the orders-of-magnitude lower activity compared to conventional SPECT, resulting in a very low number of detected counts, the presence of multiple photopeaks, substantial overlap in the emission spectra of these isotopes, and the image-degrading effects in SPECT. To address these issues, we propose a multiple-energy-window projection-domain quantification (MEW-PDQ) method that jointly estimates the regional activity uptake of both 227Th and 223Ra directly using the SPECT projection data from multiple energy windows. We evaluated the method with realistic simulation studies conducted with anthropomorphic digital phantoms, including a virtual imaging trial, in the context of imaging patients with bone metastases of prostate cancer who were treated with 227Th-based α-RPTs. The proposed method yielded reliable (accurate and precise) regional uptake estimates of both isotopes and outperformed state-of-the-art methods across different lesion sizes and contrasts, as well as in the virtual imaging trial. This reliable performance was also observed with moderate levels of intra-regional heterogeneous uptake as well as when there were moderate inaccuracies in the definitions of the support of various regions. Additionally, we demonstrated the effectiveness of using multiple energy windows and the variance of the estimated uptake using the proposed method approached the Cramér-Rao-lower-bound-defined theoretical limit. These results provide strong evidence in support of this method for reliable uptake quantification in 227Th-based α-RPTs.
{"title":"Joint regional uptake quantification of thorium-227 and radium-223 using a multiple-energy-window projection-domain quantitative SPECT method.","authors":"Zekun Li, Nadia Benabdallah, Richard Laforest, Richard L Wahl, Daniel L J Thorek, Abhinav K Jha","doi":"10.1109/TMI.2024.3420228","DOIUrl":"10.1109/TMI.2024.3420228","url":null,"abstract":"<p><p>Thorium-227 (<sup>227</sup>Th)-based α-particle radiopharmaceutical therapies (α-RPTs) are currently being investigated in several clinical and pre-clinical studies. After administration, <sup>227</sup>Th decays to <sup>223</sup>Ra, another α-particle-emitting isotope, which redistributes within the patient. Reliable dose quantification of both <sup>227</sup>Th and <sup>223</sup>Ra is clinically important, and SPECT may perform this quantification as these isotopes also emit X- and γ-ray photons. However, reliable quantification is challenging for several reasons: the orders-of-magnitude lower activity compared to conventional SPECT, resulting in a very low number of detected counts, the presence of multiple photopeaks, substantial overlap in the emission spectra of these isotopes, and the image-degrading effects in SPECT. To address these issues, we propose a multiple-energy-window projection-domain quantification (MEW-PDQ) method that jointly estimates the regional activity uptake of both <sup>227</sup>Th and <sup>223</sup>Ra directly using the SPECT projection data from multiple energy windows. We evaluated the method with realistic simulation studies conducted with anthropomorphic digital phantoms, including a virtual imaging trial, in the context of imaging patients with bone metastases of prostate cancer who were treated with <sup>227</sup>Th-based α-RPTs. The proposed method yielded reliable (accurate and precise) regional uptake estimates of both isotopes and outperformed state-of-the-art methods across different lesion sizes and contrasts, as well as in the virtual imaging trial. This reliable performance was also observed with moderate levels of intra-regional heterogeneous uptake as well as when there were moderate inaccuracies in the definitions of the support of various regions. Additionally, we demonstrated the effectiveness of using multiple energy windows and the variance of the estimated uptake using the proposed method approached the Cramér-Rao-lower-bound-defined theoretical limit. These results provide strong evidence in support of this method for reliable uptake quantification in <sup>227</sup>Th-based α-RPTs.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141539127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-04DOI: 10.1109/TMI.2024.3416398
Grigorios M Karageorgos, Jiayong Zhang, Nils Peters, Wenjun Xia, Chuang Niu, Harald Paganetti, Ge Wang, Bruno De Man
The presence of metal objects leads to corrupted CT projection measurements, resulting in metal artifacts in the reconstructed CT images. AI promises to offer improved solutions to estimate missing sinogram data for metal artifact reduction (MAR), as previously shown with convolutional neural networks (CNNs) and generative adversarial networks (GANs). Recently, denoising diffusion probabilistic models (DDPM) have shown great promise in image generation tasks, potentially outperforming GANs. In this study, a DDPM-based approach is proposed for inpainting of missing sinogram data for improved MAR. The proposed model is unconditionally trained, free from information on metal objects, which can potentially enhance its generalization capabilities across different types of metal implants compared to conditionally trained approaches. The performance of the proposed technique was evaluated and compared to the state-of-the-art normalized MAR (NMAR) approach as well as to CNN-based and GAN-based MAR approaches. The DDPM-based approach provided significantly higher SSIM and PSNR, as compared to NMAR (SSIM: p < 10-26; PSNR: p < 10-21), the CNN (SSIM: p < 10-25; PSNR: p < 10-9) and the GAN (SSIM: p < 10-6; PSNR: p < 0.05) methods. The DDPM-MAR technique was further evaluated based on clinically relevant image quality metrics on clinical CT images with virtually introduced metal objects and metal artifacts, demonstrating superior quality relative to the other three models. In general, the AI-based techniques showed improved MAR performance compared to the non-AI-based NMAR approach. The proposed methodology shows promise in enhancing the effectiveness of MAR, and therefore improving the diagnostic accuracy of CT.
{"title":"A denoising diffusion probabilistic model for metal artifact reduction in CT.","authors":"Grigorios M Karageorgos, Jiayong Zhang, Nils Peters, Wenjun Xia, Chuang Niu, Harald Paganetti, Ge Wang, Bruno De Man","doi":"10.1109/TMI.2024.3416398","DOIUrl":"https://doi.org/10.1109/TMI.2024.3416398","url":null,"abstract":"<p><p>The presence of metal objects leads to corrupted CT projection measurements, resulting in metal artifacts in the reconstructed CT images. AI promises to offer improved solutions to estimate missing sinogram data for metal artifact reduction (MAR), as previously shown with convolutional neural networks (CNNs) and generative adversarial networks (GANs). Recently, denoising diffusion probabilistic models (DDPM) have shown great promise in image generation tasks, potentially outperforming GANs. In this study, a DDPM-based approach is proposed for inpainting of missing sinogram data for improved MAR. The proposed model is unconditionally trained, free from information on metal objects, which can potentially enhance its generalization capabilities across different types of metal implants compared to conditionally trained approaches. The performance of the proposed technique was evaluated and compared to the state-of-the-art normalized MAR (NMAR) approach as well as to CNN-based and GAN-based MAR approaches. The DDPM-based approach provided significantly higher SSIM and PSNR, as compared to NMAR (SSIM: p < 10<sup>-26</sup>; PSNR: p < 10<sup>-21</sup>), the CNN (SSIM: p < 10<sup>-25</sup>; PSNR: p < 10<sup>-9</sup>) and the GAN (SSIM: p < 10<sup>-6</sup>; PSNR: p < 0.05) methods. The DDPM-MAR technique was further evaluated based on clinically relevant image quality metrics on clinical CT images with virtually introduced metal objects and metal artifacts, demonstrating superior quality relative to the other three models. In general, the AI-based techniques showed improved MAR performance compared to the non-AI-based NMAR approach. The proposed methodology shows promise in enhancing the effectiveness of MAR, and therefore improving the diagnostic accuracy of CT.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141536216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-03DOI: 10.1109/TMI.2024.3422027
Mostafa Sharifzadeh, Sobhan Goudarzi, An Tang, Habib Benali, Hassan Rivaz
One of the primary sources of suboptimal image quality in ultrasound imaging is phase aberration. It is caused by spatial changes in sound speed over a heterogeneous medium, which disturbs the transmitted waves and prevents coherent summation of echo signals. Obtaining non-aberrated ground truths in real-world scenarios can be extremely challenging, if not impossible. This challenge hinders the performance of deep learning-based techniques due to the domain shift between simulated and experimental data. Here, for the first time, we propose a deep learning-based method that does not require ground truth to correct the phase aberration problem and, as such, can be directly trained on real data. We train a network wherein both the input and target output are randomly aberrated radio frequency (RF) data. Moreover, we demonstrate that a conventional loss function such as mean square error is inadequate for training such a network to achieve optimal performance. Instead, we propose an adaptive mixed loss function that employs both B-mode and RF data, resulting in more efficient convergence and enhanced performance. Finally, we publicly release our dataset, comprising over 180,000 aberrated single plane-wave images (RF data), wherein phase aberrations are modeled as near-field phase screens. Although not utilized in the proposed method, each aberrated image is paired with its corresponding aberration profile and the non-aberrated version, aiming to mitigate the data scarcity problem in developing deep learning-based techniques for phase aberration correction. Source code and trained model are also available along with the dataset at http://code.sonography.ai/main-aaa.
相位差是超声成像图像质量不佳的主要原因之一。相位差是由异质介质上声速的空间变化引起的,它干扰了传输波,阻碍了回波信号的连贯求和。在现实世界中获取无像差的地面实况极具挑战性,甚至是不可能的。由于模拟数据和实验数据之间存在域偏移,这一挑战阻碍了基于深度学习技术的性能。在这里,我们首次提出了一种基于深度学习的方法,它不需要地面实况来纠正相差问题,因此可以直接在真实数据上进行训练。我们训练了一个网络,其输入和目标输出都是随机畸变的射频(RF)数据。此外,我们还证明了均方误差等传统损失函数不足以训练这样的网络以达到最佳性能。相反,我们提出了一种自适应混合损失函数,同时采用 B 模式和射频数据,从而提高了收敛效率和性能。最后,我们公开发布了我们的数据集,其中包括 180,000 多幅畸变的单平面波图像(射频数据),其中相位畸变被建模为近场相位屏。虽然在所提出的方法中没有使用,但每幅畸变图像都与相应的畸变轮廓和非畸变版本配对,目的是在开发基于深度学习的相位差校正技术时缓解数据稀缺问题。源代码和训练好的模型以及数据集也可在 http://code.sonography.ai/main-aaa 上获取。
{"title":"Mitigating Aberration-Induced Noise: A Deep Learning-Based Aberration-to-Aberration Approach.","authors":"Mostafa Sharifzadeh, Sobhan Goudarzi, An Tang, Habib Benali, Hassan Rivaz","doi":"10.1109/TMI.2024.3422027","DOIUrl":"10.1109/TMI.2024.3422027","url":null,"abstract":"<p><p>One of the primary sources of suboptimal image quality in ultrasound imaging is phase aberration. It is caused by spatial changes in sound speed over a heterogeneous medium, which disturbs the transmitted waves and prevents coherent summation of echo signals. Obtaining non-aberrated ground truths in real-world scenarios can be extremely challenging, if not impossible. This challenge hinders the performance of deep learning-based techniques due to the domain shift between simulated and experimental data. Here, for the first time, we propose a deep learning-based method that does not require ground truth to correct the phase aberration problem and, as such, can be directly trained on real data. We train a network wherein both the input and target output are randomly aberrated radio frequency (RF) data. Moreover, we demonstrate that a conventional loss function such as mean square error is inadequate for training such a network to achieve optimal performance. Instead, we propose an adaptive mixed loss function that employs both B-mode and RF data, resulting in more efficient convergence and enhanced performance. Finally, we publicly release our dataset, comprising over 180,000 aberrated single plane-wave images (RF data), wherein phase aberrations are modeled as near-field phase screens. Although not utilized in the proposed method, each aberrated image is paired with its corresponding aberration profile and the non-aberrated version, aiming to mitigate the data scarcity problem in developing deep learning-based techniques for phase aberration correction. Source code and trained model are also available along with the dataset at http://code.sonography.ai/main-aaa.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141499992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-02DOI: 10.1109/TMI.2024.3383283
Raffaele Mineo;F. Proietto Salanitri;G. Bellitto;I. Kavasidis;O. De Filippo;M. Millesimo;G. M. De Ferrari;M. Aldinucci;D. Giordano;S. Palazzo;F. D’Ascenzo;C. Spampinato
The quantification of stenosis severity from X-ray catheter angiography is a challenging task. Indeed, this requires to fully understand the lesion’s geometry by analyzing dynamics of the contrast material, only relying on visual observation by clinicians. To support decision making for cardiac intervention, we propose a hybrid CNN-Transformer model for the assessment of angiography-based non-invasive fractional flow-reserve (FFR) and instantaneous wave-free ratio (iFR) of intermediate coronary stenosis. Our approach predicts whether a coronary artery stenosis is hemodynamically significant and provides direct FFR and iFR estimates. This is achieved through a combination of regression and classification branches that forces the model to focus on the cut-off region of FFR (around 0.8 FFR value), which is highly critical for decision-making. We also propose a spatio-temporal factorization mechanisms that redesigns the transformer’s self-attention mechanism to capture both local spatial and temporal interactions between vessel geometry, blood flow dynamics, and lesion morphology. The proposed method achieves state-of-the-art performance on a dataset of 778 exams from 389 patients. Unlike existing methods, our approach employs a single angiography view and does not require knowledge of the key frame; supervision at training time is provided by a classification loss (based on a threshold of the FFR/iFR values) and a regression loss for direct estimation. Finally, the analysis of model interpretability and calibration shows that, in spite of the complexity of angiographic imaging data, our method can robustly identify the location of the stenosis and correlate prediction uncertainty to the provided output scores.
{"title":"A Convolutional-Transformer Model for FFR and iFR Assessment From Coronary Angiography","authors":"Raffaele Mineo;F. Proietto Salanitri;G. Bellitto;I. Kavasidis;O. De Filippo;M. Millesimo;G. M. De Ferrari;M. Aldinucci;D. Giordano;S. Palazzo;F. D’Ascenzo;C. Spampinato","doi":"10.1109/TMI.2024.3383283","DOIUrl":"10.1109/TMI.2024.3383283","url":null,"abstract":"The quantification of stenosis severity from X-ray catheter angiography is a challenging task. Indeed, this requires to fully understand the lesion’s geometry by analyzing dynamics of the contrast material, only relying on visual observation by clinicians. To support decision making for cardiac intervention, we propose a hybrid CNN-Transformer model for the assessment of angiography-based non-invasive fractional flow-reserve (FFR) and instantaneous wave-free ratio (iFR) of intermediate coronary stenosis. Our approach predicts whether a coronary artery stenosis is hemodynamically significant and provides direct FFR and iFR estimates. This is achieved through a combination of regression and classification branches that forces the model to focus on the cut-off region of FFR (around 0.8 FFR value), which is highly critical for decision-making. We also propose a spatio-temporal factorization mechanisms that redesigns the transformer’s self-attention mechanism to capture both local spatial and temporal interactions between vessel geometry, blood flow dynamics, and lesion morphology. The proposed method achieves state-of-the-art performance on a dataset of 778 exams from 389 patients. Unlike existing methods, our approach employs a single angiography view and does not require knowledge of the key frame; supervision at training time is provided by a classification loss (based on a threshold of the FFR/iFR values) and a regression loss for direct estimation. Finally, the analysis of model interpretability and calibration shows that, in spite of the complexity of angiographic imaging data, our method can robustly identify the location of the stenosis and correlate prediction uncertainty to the provided output scores.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10582501","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141494634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}