{"title":"Exploiting GPT for synthetic data generation: An empirical study","authors":"Tony Busker , Sunil Choenni , Mortaza S. Bargh","doi":"10.1016/j.giq.2024.101988","DOIUrl":null,"url":null,"abstract":"<div><div>There are many good reasons to use synthetic data instead of real data for research purposes. These reasons may range from the business sensitiveness of real data to increased cost of collecting real data in accordance with GDPR requirements. In this paper, we elaborate upon the potentials of the Large Language Model GPT as a tool to generate synthetic data for analytical purposes when there is no real-data available or accessible. Primarily, we show that by varying the scope of probes adequately, we can generate data of different granularities. To show this, we generated stereotypical data with three levels of granularity by posing more than 18,500 probes to GPT. In total, we generated stereotypical data for eight different views, which can be categorized in three view types corresponding to the three levels of granularity. Secondarily, we show that by varying the scope of probes one can create meaningful information. To show this, we performed a so-called similarity analysis on the generated stereotypical data. We used data visualizations, e.g. heatmaps, to show the views and categories within the views that are similar and those that are at odd with each other. We elaborate upon the application areas of the insight gained about such similarities and differences. Furthermore, we discuss several other types of analysis that can be performed on the generated stereotypical data.</div></div>","PeriodicalId":48258,"journal":{"name":"Government Information Quarterly","volume":"42 1","pages":"Article 101988"},"PeriodicalIF":7.8000,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Government Information Quarterly","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0740624X24000807","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
There are many good reasons to use synthetic data instead of real data for research purposes. These reasons may range from the business sensitiveness of real data to increased cost of collecting real data in accordance with GDPR requirements. In this paper, we elaborate upon the potentials of the Large Language Model GPT as a tool to generate synthetic data for analytical purposes when there is no real-data available or accessible. Primarily, we show that by varying the scope of probes adequately, we can generate data of different granularities. To show this, we generated stereotypical data with three levels of granularity by posing more than 18,500 probes to GPT. In total, we generated stereotypical data for eight different views, which can be categorized in three view types corresponding to the three levels of granularity. Secondarily, we show that by varying the scope of probes one can create meaningful information. To show this, we performed a so-called similarity analysis on the generated stereotypical data. We used data visualizations, e.g. heatmaps, to show the views and categories within the views that are similar and those that are at odd with each other. We elaborate upon the application areas of the insight gained about such similarities and differences. Furthermore, we discuss several other types of analysis that can be performed on the generated stereotypical data.
期刊介绍:
Government Information Quarterly (GIQ) delves into the convergence of policy, information technology, government, and the public. It explores the impact of policies on government information flows, the role of technology in innovative government services, and the dynamic between citizens and governing bodies in the digital age. GIQ serves as a premier journal, disseminating high-quality research and insights that bridge the realms of policy, information technology, government, and public engagement.