{"title":"Synthetic Data for Deep Learning in Computer Vision & Medical Imaging: A Means to Reduce Data Bias","authors":"Anthony Paproki, Olivier Salvado, Clinton Fookes","doi":"10.1145/3663759","DOIUrl":null,"url":null,"abstract":"<p>Deep-learning (DL) performs well in computer-vision and medical-imaging automated decision-making applications. A bottleneck of DL stems from the large amount of labelled data required to train accurate models that generalise well. Data scarcity and imbalance are common problems in imaging applications that can lead DL models towards biased decision making. A solution to this problem is synthetic data. Synthetic data is an inexpensive substitute to real data for improved accuracy and generalisability of DL models. This survey reviews the recent methods published in relation to the creation and use of synthetic data for computer-vision and medical-imaging DL applications. The focus will be on applications that utilised synthetic data to improve DL models by either incorporating an increased diversity of data that is difficult to obtain in real life, or by reducing a bias caused by class imbalance. Computer-graphics software and generative networks are the most popular data generation techniques encountered in the literature. We highlight their suitability for typical computer-vision and medical-imaging applications, and present promising avenues for research to overcome their computational and theoretical limitations.</p>","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":null,"pages":null},"PeriodicalIF":23.8000,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Computing Surveys","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3663759","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Deep-learning (DL) performs well in computer-vision and medical-imaging automated decision-making applications. A bottleneck of DL stems from the large amount of labelled data required to train accurate models that generalise well. Data scarcity and imbalance are common problems in imaging applications that can lead DL models towards biased decision making. A solution to this problem is synthetic data. Synthetic data is an inexpensive substitute to real data for improved accuracy and generalisability of DL models. This survey reviews the recent methods published in relation to the creation and use of synthetic data for computer-vision and medical-imaging DL applications. The focus will be on applications that utilised synthetic data to improve DL models by either incorporating an increased diversity of data that is difficult to obtain in real life, or by reducing a bias caused by class imbalance. Computer-graphics software and generative networks are the most popular data generation techniques encountered in the literature. We highlight their suitability for typical computer-vision and medical-imaging applications, and present promising avenues for research to overcome their computational and theoretical limitations.
期刊介绍:
ACM Computing Surveys is an academic journal that focuses on publishing surveys and tutorials on various areas of computing research and practice. The journal aims to provide comprehensive and easily understandable articles that guide readers through the literature and help them understand topics outside their specialties. In terms of impact, CSUR has a high reputation with a 2022 Impact Factor of 16.6. It is ranked 3rd out of 111 journals in the field of Computer Science Theory & Methods.
ACM Computing Surveys is indexed and abstracted in various services, including AI2 Semantic Scholar, Baidu, Clarivate/ISI: JCR, CNKI, DeepDyve, DTU, EBSCO: EDS/HOST, and IET Inspec, among others.