Miguel Rujas, Rodrigo Martín Gómez del Moral Herranz, Giuseppe Fico , Beatriz Merino-Barbancho
{"title":"医疗保健中的合成数据生成:对领域、动机和未来应用程序的审查进行范围界定。","authors":"Miguel Rujas, Rodrigo Martín Gómez del Moral Herranz, Giuseppe Fico , Beatriz Merino-Barbancho","doi":"10.1016/j.ijmedinf.2024.105763","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>The development of Artificial Intelligence in the healthcare sector is generating a great impact. However, one of the primary challenges for the implementation of this technology is the access to high-quality data due to issues in data collection and regulatory constraints, for which synthetic data is an emerging alternative. While previous research has reviewed synthetic data generation techniques, there is limited focus on their applications and the motivations driving their synthesis. A comprehensive review is needed to expand the potential of synthetic data into less explored healthcare areas.</div></div><div><h3>Objective</h3><div>This review aims to identify the healthcare domains where synthetic data are currently generated, the motivations behind their creation, their future uses, limitations, and types of data.</div></div><div><h3>Materials and methods</h3><div>Following the PRISMA-ScR framework, this review analysed literature from the last 10 years within PubMed, Scopus, and Web of Science. Reviews containing information on synthetic data generation in healthcare were screened and analysed. Key healthcare domains, motivations, future uses, and gaps in the literature were identified through a structured data extraction process.</div></div><div><h3>Results</h3><div>Of the 346 reviews identified, 42 were included for data extraction. Thirteen main domains were identified, with Oncology, Neurology, and Cardiology being the most frequently mentioned. Five primary motivations for synthetic data generation and three major categories of future applications were highlighted. Additionally, unstructured data, particularly images, were found to be the predominant type of synthetic data generated.</div></div><div><h3>Discussion and conclusion</h3><div>Synthetic data are currently being generated across diverse healthcare domains, showcasing their adaptability and potential. Despite their early stage, synthetic data technologies hold significant promise for future applications. Expanding their use into new domains and less common data types (e.g., video and text) could further enhance their impact. Future work should focus on developing evaluation benchmarks and standardized generative models tailored to specific healthcare domains.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"195 ","pages":"Article 105763"},"PeriodicalIF":3.7000,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Synthetic data generation in healthcare: A scoping review of reviews on domains, motivations, and future applications\",\"authors\":\"Miguel Rujas, Rodrigo Martín Gómez del Moral Herranz, Giuseppe Fico , Beatriz Merino-Barbancho\",\"doi\":\"10.1016/j.ijmedinf.2024.105763\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>The development of Artificial Intelligence in the healthcare sector is generating a great impact. However, one of the primary challenges for the implementation of this technology is the access to high-quality data due to issues in data collection and regulatory constraints, for which synthetic data is an emerging alternative. While previous research has reviewed synthetic data generation techniques, there is limited focus on their applications and the motivations driving their synthesis. A comprehensive review is needed to expand the potential of synthetic data into less explored healthcare areas.</div></div><div><h3>Objective</h3><div>This review aims to identify the healthcare domains where synthetic data are currently generated, the motivations behind their creation, their future uses, limitations, and types of data.</div></div><div><h3>Materials and methods</h3><div>Following the PRISMA-ScR framework, this review analysed literature from the last 10 years within PubMed, Scopus, and Web of Science. Reviews containing information on synthetic data generation in healthcare were screened and analysed. Key healthcare domains, motivations, future uses, and gaps in the literature were identified through a structured data extraction process.</div></div><div><h3>Results</h3><div>Of the 346 reviews identified, 42 were included for data extraction. Thirteen main domains were identified, with Oncology, Neurology, and Cardiology being the most frequently mentioned. Five primary motivations for synthetic data generation and three major categories of future applications were highlighted. Additionally, unstructured data, particularly images, were found to be the predominant type of synthetic data generated.</div></div><div><h3>Discussion and conclusion</h3><div>Synthetic data are currently being generated across diverse healthcare domains, showcasing their adaptability and potential. Despite their early stage, synthetic data technologies hold significant promise for future applications. Expanding their use into new domains and less common data types (e.g., video and text) could further enhance their impact. Future work should focus on developing evaluation benchmarks and standardized generative models tailored to specific healthcare domains.</div></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":\"195 \",\"pages\":\"Article 105763\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S138650562400426X\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S138650562400426X","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
摘要
背景:人工智能在医疗领域的发展正在产生巨大的影响。然而,实施该技术的主要挑战之一是,由于数据收集和监管限制方面的问题,无法获得高质量的数据,而合成数据是一种新兴的替代方案。虽然以前的研究已经审查了合成数据生成技术,但对其应用和驱动其合成的动机的关注有限。需要进行全面审查,以扩大合成数据在较少探索的医疗保健领域的潜力。目的:本综述旨在确定当前生成合成数据的医疗保健领域、创建合成数据背后的动机、它们的未来用途、限制和数据类型。材料和方法:遵循PRISMA-ScR框架,本综述分析了PubMed、Scopus和Web of Science中过去10年的文献。审查和分析了关于保健领域合成数据生成的信息。通过结构化数据提取过程确定了关键的医疗保健领域、动机、未来用途和文献中的差距。结果:在确定的346篇综述中,有42篇纳入了数据提取。确定了13个主要领域,其中肿瘤学,神经学和心脏病学是最常被提及的。强调了合成数据生成的五个主要动机和未来应用的三个主要类别。此外,非结构化数据,特别是图像,被发现是生成的合成数据的主要类型。讨论和结论:目前正在不同的医疗保健领域生成合成数据,展示了它们的适应性和潜力。尽管合成数据技术处于早期阶段,但它在未来的应用中具有重要的前景。将它们的使用扩大到新的领域和不太常见的数据类型(例如视频和文本)可以进一步加强它们的影响。未来的工作应侧重于开发针对特定医疗保健领域的评估基准和标准化生成模型。
Synthetic data generation in healthcare: A scoping review of reviews on domains, motivations, and future applications
Background
The development of Artificial Intelligence in the healthcare sector is generating a great impact. However, one of the primary challenges for the implementation of this technology is the access to high-quality data due to issues in data collection and regulatory constraints, for which synthetic data is an emerging alternative. While previous research has reviewed synthetic data generation techniques, there is limited focus on their applications and the motivations driving their synthesis. A comprehensive review is needed to expand the potential of synthetic data into less explored healthcare areas.
Objective
This review aims to identify the healthcare domains where synthetic data are currently generated, the motivations behind their creation, their future uses, limitations, and types of data.
Materials and methods
Following the PRISMA-ScR framework, this review analysed literature from the last 10 years within PubMed, Scopus, and Web of Science. Reviews containing information on synthetic data generation in healthcare were screened and analysed. Key healthcare domains, motivations, future uses, and gaps in the literature were identified through a structured data extraction process.
Results
Of the 346 reviews identified, 42 were included for data extraction. Thirteen main domains were identified, with Oncology, Neurology, and Cardiology being the most frequently mentioned. Five primary motivations for synthetic data generation and three major categories of future applications were highlighted. Additionally, unstructured data, particularly images, were found to be the predominant type of synthetic data generated.
Discussion and conclusion
Synthetic data are currently being generated across diverse healthcare domains, showcasing their adaptability and potential. Despite their early stage, synthetic data technologies hold significant promise for future applications. Expanding their use into new domains and less common data types (e.g., video and text) could further enhance their impact. Future work should focus on developing evaluation benchmarks and standardized generative models tailored to specific healthcare domains.
期刊介绍:
International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings.
The scope of journal covers:
Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.;
Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc.
Educational computer based programs pertaining to medical informatics or medicine in general;
Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.