Styliani-Christina Fragkouli, Dhwani Solanki, Leyla J Castro, Fotis E Psomopoulos, Núria Queralt-Rosinach, Davide Cirillo, Lisa C Crossman
{"title":"合成数据:如何将其用于传染病研究?","authors":"Styliani-Christina Fragkouli, Dhwani Solanki, Leyla J Castro, Fotis E Psomopoulos, Núria Queralt-Rosinach, Davide Cirillo, Lisa C Crossman","doi":"arxiv-2407.06211","DOIUrl":null,"url":null,"abstract":"Over the last three to five years, it has become possible to generate machine\nlearning synthetic data for healthcare-related uses. However, concerns have\nbeen raised about potential negative factors associated with the possibilities\nof artificial dataset generation. These include the potential misuse of\ngenerative artificial intelligence (AI) in fields such as cybercrime, the use\nof deepfakes and fake news to deceive or manipulate, and displacement of human\njobs across various market sectors. Here, we consider both current and future positive advances and possibilities\nwith synthetic datasets. Synthetic data offers significant benefits,\nparticularly in data privacy, research, in balancing datasets and reducing bias\nin machine learning models. Generative AI is an artificial intelligence genre\ncapable of creating text, images, video or other data using generative models.\nThe recent explosion of interest in GenAI was heralded by the invention and\nspeedy move to use of large language models (LLM). These computational models\nare able to achieve general-purpose language generation and other natural\nlanguage processing tasks and are based on transformer architectures, which\nmade an evolutionary leap from previous neural network architectures. Fuelled by the advent of improved GenAI techniques and wide scale usage, this\nis surely the time to consider how synthetic data can be used to advance\ninfectious disease research. In this commentary we aim to create an overview of\nthe current and future position of synthetic data in infectious disease\nresearch.","PeriodicalId":501219,"journal":{"name":"arXiv - QuanBio - Other Quantitative Biology","volume":"22 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Synthetic data: How could it be used for infectious disease research?\",\"authors\":\"Styliani-Christina Fragkouli, Dhwani Solanki, Leyla J Castro, Fotis E Psomopoulos, Núria Queralt-Rosinach, Davide Cirillo, Lisa C Crossman\",\"doi\":\"arxiv-2407.06211\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Over the last three to five years, it has become possible to generate machine\\nlearning synthetic data for healthcare-related uses. However, concerns have\\nbeen raised about potential negative factors associated with the possibilities\\nof artificial dataset generation. These include the potential misuse of\\ngenerative artificial intelligence (AI) in fields such as cybercrime, the use\\nof deepfakes and fake news to deceive or manipulate, and displacement of human\\njobs across various market sectors. Here, we consider both current and future positive advances and possibilities\\nwith synthetic datasets. Synthetic data offers significant benefits,\\nparticularly in data privacy, research, in balancing datasets and reducing bias\\nin machine learning models. Generative AI is an artificial intelligence genre\\ncapable of creating text, images, video or other data using generative models.\\nThe recent explosion of interest in GenAI was heralded by the invention and\\nspeedy move to use of large language models (LLM). These computational models\\nare able to achieve general-purpose language generation and other natural\\nlanguage processing tasks and are based on transformer architectures, which\\nmade an evolutionary leap from previous neural network architectures. Fuelled by the advent of improved GenAI techniques and wide scale usage, this\\nis surely the time to consider how synthetic data can be used to advance\\ninfectious disease research. In this commentary we aim to create an overview of\\nthe current and future position of synthetic data in infectious disease\\nresearch.\",\"PeriodicalId\":501219,\"journal\":{\"name\":\"arXiv - QuanBio - Other Quantitative Biology\",\"volume\":\"22 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Other Quantitative Biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.06211\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Other Quantitative Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.06211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Synthetic data: How could it be used for infectious disease research?
Over the last three to five years, it has become possible to generate machine
learning synthetic data for healthcare-related uses. However, concerns have
been raised about potential negative factors associated with the possibilities
of artificial dataset generation. These include the potential misuse of
generative artificial intelligence (AI) in fields such as cybercrime, the use
of deepfakes and fake news to deceive or manipulate, and displacement of human
jobs across various market sectors. Here, we consider both current and future positive advances and possibilities
with synthetic datasets. Synthetic data offers significant benefits,
particularly in data privacy, research, in balancing datasets and reducing bias
in machine learning models. Generative AI is an artificial intelligence genre
capable of creating text, images, video or other data using generative models.
The recent explosion of interest in GenAI was heralded by the invention and
speedy move to use of large language models (LLM). These computational models
are able to achieve general-purpose language generation and other natural
language processing tasks and are based on transformer architectures, which
made an evolutionary leap from previous neural network architectures. Fuelled by the advent of improved GenAI techniques and wide scale usage, this
is surely the time to consider how synthetic data can be used to advance
infectious disease research. In this commentary we aim to create an overview of
the current and future position of synthetic data in infectious disease
research.