{"title":"DuDoCROP: Dual-Domain CLIP-Assisted Residual Optimization Perception Model for CT Metal Artifact Reduction","authors":"Xinrui Zhang, Ailong Cai, Lei Li, Bin Yan","doi":"arxiv-2408.14342","DOIUrl":null,"url":null,"abstract":"Metal artifacts in computed tomography (CT) imaging pose significant\nchallenges to accurate clinical diagnosis. The presence of high-density\nmetallic implants results in artifacts that deteriorate image quality,\nmanifesting in the forms of streaking, blurring, or beam hardening effects,\netc. Nowadays, various deep learning-based approaches, particularly generative\nmodels, have been proposed for metal artifact reduction (MAR). However, these\nmethods have limited perception ability in the diverse morphologies of\ndifferent metal implants with artifacts, which may generate spurious anatomical\nstructures and exhibit inferior generalization capability. To address the\nissues, we leverage visual-language model (VLM) to identify these morphological\nfeatures and introduce them into a dual-domain CLIP-assisted residual\noptimization perception model (DuDoCROP) for MAR. Specifically, a dual-domain\nCLIP (DuDoCLIP) is fine-tuned on the image domain and sinogram domain using\ncontrastive learning to extract semantic descriptions from anatomical\nstructures and metal artifacts. Subsequently, a diffusion model is guided by\nthe embeddings of DuDoCLIP, thereby enabling the dual-domain prior generation.\nAdditionally, we design prompt engineering for more precise image-text\ndescriptions that can enhance the model's perception capability. Then, a\ndownstream task is devised for the one-step residual optimization and\nintegration of dual-domain priors, while incorporating raw data fidelity.\nUltimately, a new perceptual indicator is proposed to validate the model's\nperception and generation performance. With the assistance of DuDoCLIP, our\nDuDoCROP exhibits at least 63.7% higher generalization capability compared to\nthe baseline model. Numerical experiments demonstrate that the proposed method\ncan generate more realistic image structures and outperform other SOTA\napproaches both qualitatively and quantitatively.","PeriodicalId":501378,"journal":{"name":"arXiv - PHYS - Medical Physics","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Medical Physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.14342","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Metal artifacts in computed tomography (CT) imaging pose significant
challenges to accurate clinical diagnosis. The presence of high-density
metallic implants results in artifacts that deteriorate image quality,
manifesting in the forms of streaking, blurring, or beam hardening effects,
etc. Nowadays, various deep learning-based approaches, particularly generative
models, have been proposed for metal artifact reduction (MAR). However, these
methods have limited perception ability in the diverse morphologies of
different metal implants with artifacts, which may generate spurious anatomical
structures and exhibit inferior generalization capability. To address the
issues, we leverage visual-language model (VLM) to identify these morphological
features and introduce them into a dual-domain CLIP-assisted residual
optimization perception model (DuDoCROP) for MAR. Specifically, a dual-domain
CLIP (DuDoCLIP) is fine-tuned on the image domain and sinogram domain using
contrastive learning to extract semantic descriptions from anatomical
structures and metal artifacts. Subsequently, a diffusion model is guided by
the embeddings of DuDoCLIP, thereby enabling the dual-domain prior generation.
Additionally, we design prompt engineering for more precise image-text
descriptions that can enhance the model's perception capability. Then, a
downstream task is devised for the one-step residual optimization and
integration of dual-domain priors, while incorporating raw data fidelity.
Ultimately, a new perceptual indicator is proposed to validate the model's
perception and generation performance. With the assistance of DuDoCLIP, our
DuDoCROP exhibits at least 63.7% higher generalization capability compared to
the baseline model. Numerical experiments demonstrate that the proposed method
can generate more realistic image structures and outperform other SOTA
approaches both qualitatively and quantitatively.