MedSyn: Text-guided Anatomy-aware Synthesis of High-Fidelity 3D CT Images.

IEEE transactions on medical imaging Pub Date : 2024-06-20 DOI:10.1109/TMI.2024.3415032

Yanwu Xu, Li Sun, Wei Peng, Shuyue Jia, Katelyn Morrison, Adam Perer, Afrooz Zandifar, Shyam Visweswaran, Motahhare Eslami, Kayhan Batmanghelich

{"title":"MedSyn: Text-guided Anatomy-aware Synthesis of High-Fidelity 3D CT Images.","authors":"Yanwu Xu, Li Sun, Wei Peng, Shuyue Jia, Katelyn Morrison, Adam Perer, Afrooz Zandifar, Shyam Visweswaran, Motahhare Eslami, Kayhan Batmanghelich","doi":"10.1109/TMI.2024.3415032","DOIUrl":null,"url":null,"abstract":"<p><p>This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize radiology reports' abundant information. The radiology reports can enhance the generation process by providing additional guidance and offering fine-grained control over the synthesis of images. Nevertheless, expanding text-guided generation to high-resolution 3D images poses significant memory and anatomical detail-preserving challenges. Addressing the memory issue, we introduce a hierarchical scheme that uses a modified UNet architecture. We start by synthesizing low-resolution images conditioned on the text, serving as a foundation for subsequent generators for complete volumetric data. To ensure the anatomical plausibility of the generated samples, we provide further guidance by generating vascular, airway, and lobular segmentation masks in conjunction with the CT images. The model demonstrates the capability to use textual input and segmentation tasks to generate synthesized images. Algorithmic comparative assessments and blind evaluations conducted by 10 board-certified radiologists indicate that our approach exhibits superior performance compared to the most advanced models based on GAN and diffusion techniques, especially in accurately retaining crucial anatomical features such as fissure lines and airways. This innovation introduces novel possibilities. This study focuses on two main objectives: (1) the development of a method for creating images based on textual prompts and anatomical components, and (2) the capability to generate new images conditioning on anatomical elements. The advancements in image generation can be applied to enhance numerous downstream tasks.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TMI.2024.3415032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize radiology reports' abundant information. The radiology reports can enhance the generation process by providing additional guidance and offering fine-grained control over the synthesis of images. Nevertheless, expanding text-guided generation to high-resolution 3D images poses significant memory and anatomical detail-preserving challenges. Addressing the memory issue, we introduce a hierarchical scheme that uses a modified UNet architecture. We start by synthesizing low-resolution images conditioned on the text, serving as a foundation for subsequent generators for complete volumetric data. To ensure the anatomical plausibility of the generated samples, we provide further guidance by generating vascular, airway, and lobular segmentation masks in conjunction with the CT images. The model demonstrates the capability to use textual input and segmentation tasks to generate synthesized images. Algorithmic comparative assessments and blind evaluations conducted by 10 board-certified radiologists indicate that our approach exhibits superior performance compared to the most advanced models based on GAN and diffusion techniques, especially in accurately retaining crucial anatomical features such as fissure lines and airways. This innovation introduces novel possibilities. This study focuses on two main objectives: (1) the development of a method for creating images based on textual prompts and anatomical components, and (2) the capability to generate new images conditioning on anatomical elements. The advancements in image generation can be applied to enhance numerous downstream tasks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MedSyn：高保真三维 CT 图像的文本指导解剖学感知合成。

本文介绍了一种在文本信息指导下生成高质量三维肺部 CT 图像的创新方法。虽然基于扩散的生成模型越来越多地应用于医学影像领域，但目前最先进的方法仅限于低分辨率输出，对放射报告的丰富信息利用不足。放射学报告可以通过提供额外的指导和对图像合成的精细控制来增强生成过程。然而，将文本引导生成扩展到高分辨率三维图像会给内存和解剖细节保护带来巨大挑战。为了解决内存问题，我们引入了一种使用改进的 UNet 架构的分层方案。我们首先根据文本合成低分辨率图像，作为后续生成完整容积数据的基础。为确保生成样本的解剖学可信度，我们结合 CT 图像生成血管、气道和小叶分割掩膜，从而提供进一步的指导。该模型展示了使用文本输入和分割任务生成合成图像的能力。算法比较评估和由 10 位经委员会认证的放射科医生进行的盲评表明，与基于 GAN 和扩散技术的最先进模型相比，我们的方法表现出更优越的性能，尤其是在准确保留裂隙线和气道等关键解剖特征方面。这一创新带来了新的可能性。这项研究有两个主要目标：(1) 开发基于文字提示和解剖成分的图像创建方法；(2) 根据解剖元素生成新图像的能力。图像生成方面的进步可用于加强许多下游任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on medical imaging

自引率

0.00%

发文量