Xiaoyan Kui , Bo Liu , Zanbo Sun , Qinsong Li , Min Zhang , Wei Liang , Beiji Zou
{"title":"Med-LVDM: Medical latent variational diffusion model for medical image translation","authors":"Xiaoyan Kui , Bo Liu , Zanbo Sun , Qinsong Li , Min Zhang , Wei Liang , Beiji Zou","doi":"10.1016/j.bspc.2025.107735","DOIUrl":null,"url":null,"abstract":"<div><div>Learning-based methods for medical image translation have proven effective in addressing the challenge of obtaining complete multimodal medical images in clinical practice, particularly when patients are allergic to contrast agents or critical illnesses. Recently, diffusion models have exhibited superior performance in various image-generation tasks and are expected to replace generative adversarial networks (GANs) for medical image translation. However, existing methods suffer from unintuitive training objectives and complex network structures that curtail their efficacy in this domain. To address this gap, we propose a novel medical latent variational diffusion model (Med-LVDM) for efficient medical image translation. Firstly, we introduce a new parametric representation based on the variational diffusion model (VDM) and optimize the training objective to the weighted mean square error between the synthetic and target images, which is intuitive and has fewer model parameters. Then, we map the diffusion training and sampling process to the latent space, significantly reducing computational complexity to enhance the feasibility of clinical applications. Finally, to capture global information without focusing solely on local features, we utilize U-ViT as the backbone for Med-LVDM to effectively adapt to the latent space representing abstract information rather than concrete pixel-level information. Extensive qualitative and quantitative results in multi-contrast MRI and cross-modality MRI-CT datasets demonstrate our superiority in translation quality compared to state-of-the-art methods. In particular, Med-LVDM achieved its highest SSIM and PSNR of 92.37% and 26.23 dB on the BraTS2018 dataset, 90.18% and 24.55 dB on the IXI dataset, 91.61% and 25.52 dB on the MRI-CT dataset.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"106 ","pages":"Article 107735"},"PeriodicalIF":4.9000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Signal Processing and Control","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1746809425002460","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
Med-LVDM: Medical latent variational diffusion model for medical image translation
Learning-based methods for medical image translation have proven effective in addressing the challenge of obtaining complete multimodal medical images in clinical practice, particularly when patients are allergic to contrast agents or critical illnesses. Recently, diffusion models have exhibited superior performance in various image-generation tasks and are expected to replace generative adversarial networks (GANs) for medical image translation. However, existing methods suffer from unintuitive training objectives and complex network structures that curtail their efficacy in this domain. To address this gap, we propose a novel medical latent variational diffusion model (Med-LVDM) for efficient medical image translation. Firstly, we introduce a new parametric representation based on the variational diffusion model (VDM) and optimize the training objective to the weighted mean square error between the synthetic and target images, which is intuitive and has fewer model parameters. Then, we map the diffusion training and sampling process to the latent space, significantly reducing computational complexity to enhance the feasibility of clinical applications. Finally, to capture global information without focusing solely on local features, we utilize U-ViT as the backbone for Med-LVDM to effectively adapt to the latent space representing abstract information rather than concrete pixel-level information. Extensive qualitative and quantitative results in multi-contrast MRI and cross-modality MRI-CT datasets demonstrate our superiority in translation quality compared to state-of-the-art methods. In particular, Med-LVDM achieved its highest SSIM and PSNR of 92.37% and 26.23 dB on the BraTS2018 dataset, 90.18% and 24.55 dB on the IXI dataset, 91.61% and 25.52 dB on the MRI-CT dataset.
期刊介绍:
Biomedical Signal Processing and Control aims to provide a cross-disciplinary international forum for the interchange of information on research in the measurement and analysis of signals and images in clinical medicine and the biological sciences. Emphasis is placed on contributions dealing with the practical, applications-led research on the use of methods and devices in clinical diagnosis, patient monitoring and management.
Biomedical Signal Processing and Control reflects the main areas in which these methods are being used and developed at the interface of both engineering and clinical science. The scope of the journal is defined to include relevant review papers, technical notes, short communications and letters. Tutorial papers and special issues will also be published.