Evi M.C. Huijben, Josien P.W. Pluim, Maureen A.J.M. van Eijnatten
{"title":"Denoising diffusion probabilistic models for addressing data limitations in chest X-ray classification","authors":"Evi M.C. Huijben, Josien P.W. Pluim, Maureen A.J.M. van Eijnatten","doi":"10.1016/j.imu.2024.101575","DOIUrl":null,"url":null,"abstract":"<div><p>Deep learning plays a crucial role in medical imaging analysis, particularly in tasks such as image classification and segmentation. However, learning from medical imaging datasets presents challenges, including scarcity of labeled examples, class imbalances, and inadequate representation of diverse patient populations. To address these challenges, there has been a growing interest in the use of deep generative models to create synthetic training data, with denoising diffusion probabilistic models (DDPMs) recently gaining attention for their ability to produce realistic and high-quality images. This study explores the potential of a DDPM to generate synthetic chest X-rays for multi-label classifier training. The results indicate that the use of a conditional DDPM has the potential to produce a realistic training set of synthetic chest X-rays. In addition, the study analyzes the impact on classification performance of addressing class imbalance. Balancing the synthetic training set increased the overall classification sensitivity from 0.02 to 0.59, but decreased the overall specificity from 0.99 to 0.71. Furthermore, we investigated the potential of unconditional pre-training to learn general representations, followed by conditional fine-tuning of the DDPM. The results indicate that this approach allows the amount of labeled training data to be reduced to 25% of the original set. Finally, we demonstrate that fidelity and classification metrics do not consistently exhibit the same trends. Integrating a DDPM into the classification pipeline underscores the benefits of having optimal control over the data and efficient use of available unlabeled data. Our research provides insights for making informed decisions about integrating generative models into medical image analysis.</p></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"50 ","pages":"Article 101575"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S235291482400131X/pdfft?md5=629db3cc19c06c57d9e66726c73db9a2&pid=1-s2.0-S235291482400131X-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatics in Medicine Unlocked","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S235291482400131X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning plays a crucial role in medical imaging analysis, particularly in tasks such as image classification and segmentation. However, learning from medical imaging datasets presents challenges, including scarcity of labeled examples, class imbalances, and inadequate representation of diverse patient populations. To address these challenges, there has been a growing interest in the use of deep generative models to create synthetic training data, with denoising diffusion probabilistic models (DDPMs) recently gaining attention for their ability to produce realistic and high-quality images. This study explores the potential of a DDPM to generate synthetic chest X-rays for multi-label classifier training. The results indicate that the use of a conditional DDPM has the potential to produce a realistic training set of synthetic chest X-rays. In addition, the study analyzes the impact on classification performance of addressing class imbalance. Balancing the synthetic training set increased the overall classification sensitivity from 0.02 to 0.59, but decreased the overall specificity from 0.99 to 0.71. Furthermore, we investigated the potential of unconditional pre-training to learn general representations, followed by conditional fine-tuning of the DDPM. The results indicate that this approach allows the amount of labeled training data to be reduced to 25% of the original set. Finally, we demonstrate that fidelity and classification metrics do not consistently exhibit the same trends. Integrating a DDPM into the classification pipeline underscores the benefits of having optimal control over the data and efficient use of available unlabeled data. Our research provides insights for making informed decisions about integrating generative models into medical image analysis.
期刊介绍:
Informatics in Medicine Unlocked (IMU) is an international gold open access journal covering a broad spectrum of topics within medical informatics, including (but not limited to) papers focusing on imaging, pathology, teledermatology, public health, ophthalmological, nursing and translational medicine informatics. The full papers that are published in the journal are accessible to all who visit the website.