合成数据：如何将其用于传染病研究？

arXiv - QuanBio - Other Quantitative Biology Pub Date : 2024-07-03 DOI:arxiv-2407.06211

Styliani-Christina Fragkouli, Dhwani Solanki, Leyla J Castro, Fotis E Psomopoulos, Núria Queralt-Rosinach, Davide Cirillo, Lisa C Crossman

{"title":"合成数据：如何将其用于传染病研究？","authors":"Styliani-Christina Fragkouli, Dhwani Solanki, Leyla J Castro, Fotis E Psomopoulos, Núria Queralt-Rosinach, Davide Cirillo, Lisa C Crossman","doi":"arxiv-2407.06211","DOIUrl":null,"url":null,"abstract":"Over the last three to five years, it has become possible to generate machine\nlearning synthetic data for healthcare-related uses. However, concerns have\nbeen raised about potential negative factors associated with the possibilities\nof artificial dataset generation. These include the potential misuse of\ngenerative artificial intelligence (AI) in fields such as cybercrime, the use\nof deepfakes and fake news to deceive or manipulate, and displacement of human\njobs across various market sectors. Here, we consider both current and future positive advances and possibilities\nwith synthetic datasets. Synthetic data offers significant benefits,\nparticularly in data privacy, research, in balancing datasets and reducing bias\nin machine learning models. Generative AI is an artificial intelligence genre\ncapable of creating text, images, video or other data using generative models.\nThe recent explosion of interest in GenAI was heralded by the invention and\nspeedy move to use of large language models (LLM). These computational models\nare able to achieve general-purpose language generation and other natural\nlanguage processing tasks and are based on transformer architectures, which\nmade an evolutionary leap from previous neural network architectures. Fuelled by the advent of improved GenAI techniques and wide scale usage, this\nis surely the time to consider how synthetic data can be used to advance\ninfectious disease research. In this commentary we aim to create an overview of\nthe current and future position of synthetic data in infectious disease\nresearch.","PeriodicalId":501219,"journal":{"name":"arXiv - QuanBio - Other Quantitative Biology","volume":"22 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Synthetic data: How could it be used for infectious disease research?\",\"authors\":\"Styliani-Christina Fragkouli, Dhwani Solanki, Leyla J Castro, Fotis E Psomopoulos, Núria Queralt-Rosinach, Davide Cirillo, Lisa C Crossman\",\"doi\":\"arxiv-2407.06211\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Over the last three to five years, it has become possible to generate machine\\nlearning synthetic data for healthcare-related uses. However, concerns have\\nbeen raised about potential negative factors associated with the possibilities\\nof artificial dataset generation. These include the potential misuse of\\ngenerative artificial intelligence (AI) in fields such as cybercrime, the use\\nof deepfakes and fake news to deceive or manipulate, and displacement of human\\njobs across various market sectors. Here, we consider both current and future positive advances and possibilities\\nwith synthetic datasets. Synthetic data offers significant benefits,\\nparticularly in data privacy, research, in balancing datasets and reducing bias\\nin machine learning models. Generative AI is an artificial intelligence genre\\ncapable of creating text, images, video or other data using generative models.\\nThe recent explosion of interest in GenAI was heralded by the invention and\\nspeedy move to use of large language models (LLM). These computational models\\nare able to achieve general-purpose language generation and other natural\\nlanguage processing tasks and are based on transformer architectures, which\\nmade an evolutionary leap from previous neural network architectures. Fuelled by the advent of improved GenAI techniques and wide scale usage, this\\nis surely the time to consider how synthetic data can be used to advance\\ninfectious disease research. In this commentary we aim to create an overview of\\nthe current and future position of synthetic data in infectious disease\\nresearch.\",\"PeriodicalId\":501219,\"journal\":{\"name\":\"arXiv - QuanBio - Other Quantitative Biology\",\"volume\":\"22 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Other Quantitative Biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.06211\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Other Quantitative Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.06211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在过去的三到五年中，为医疗保健相关用途生成机器学习合成数据已成为可能。然而，与人工生成数据集的可能性相关的潜在负面因素也引起了人们的关注。这些因素包括生成人工智能（AI）在网络犯罪等领域的潜在滥用、利用深度伪造和假新闻进行欺骗或操纵，以及在各个市场领域取代人类工作。在此，我们将探讨当前和未来合成数据集的积极进展和可能性。合成数据具有显著优势，特别是在数据隐私、研究、平衡数据集和减少机器学习模型偏差方面。生成式人工智能（Genative AI）是一种能够使用生成模型创建文本、图像、视频或其他数据的人工智能类型。这些计算模型能够实现通用语言生成和其他自然语言处理任务，并以变压器架构为基础，与以前的神经网络架构相比实现了飞跃。随着 GenAI 技术的改进和广泛应用，现在肯定是考虑如何利用合成数据推进传染病研究的时候了。在这篇评论中，我们旨在概述合成数据在传染病研究中的当前和未来地位。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Synthetic data: How could it be used for infectious disease research?

Over the last three to five years, it has become possible to generate machine learning synthetic data for healthcare-related uses. However, concerns have been raised about potential negative factors associated with the possibilities of artificial dataset generation. These include the potential misuse of generative artificial intelligence (AI) in fields such as cybercrime, the use of deepfakes and fake news to deceive or manipulate, and displacement of human jobs across various market sectors. Here, we consider both current and future positive advances and possibilities with synthetic datasets. Synthetic data offers significant benefits, particularly in data privacy, research, in balancing datasets and reducing bias in machine learning models. Generative AI is an artificial intelligence genre capable of creating text, images, video or other data using generative models. The recent explosion of interest in GenAI was heralded by the invention and speedy move to use of large language models (LLM). These computational models are able to achieve general-purpose language generation and other natural language processing tasks and are based on transformer architectures, which made an evolutionary leap from previous neural network architectures. Fuelled by the advent of improved GenAI techniques and wide scale usage, this is surely the time to consider how synthetic data can be used to advance infectious disease research. In this commentary we aim to create an overview of the current and future position of synthetic data in infectious disease research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - QuanBio - Other Quantitative Biology

自引率

0.00%

发文量