B. I. Nasution, Irfan Dwiki Bhaswara, Y. Nugraha, J. Kanggrawan
{"title":"Data Analysis and Synthesis of COVID-19 Patients using Deep Generative Models: A Case Study of Jakarta, Indonesia","authors":"B. I. Nasution, Irfan Dwiki Bhaswara, Y. Nugraha, J. Kanggrawan","doi":"10.1109/ISC255366.2022.9921948","DOIUrl":null,"url":null,"abstract":"Two years have passed since COVID-19 broke out in Indonesia. In Indonesia, the central and regional governments have used vast amounts of data on COVID-19 patients for policymaking. However, it is clear that privacy problems can arise when people use their data. Thus, it is crucial to keep COVID-19 data private, using synthetic data publishing (SDP). One of the well-known SDP methods is by using deep generative models. This study explores the usage of deep generative models to synthesise COVID-19 individual data. The deep generative models used in this paper are Generative Adversarial Networks (GAN), Adversarial Autoencoders (AAE), and Adversarial Variational Bayes (AVB). This study found that AAE and AVB outperform GAN in loss, distribution, and privacy preservation, mainly when using the Wasserstein approach. Furthermore, the synthetic data produced predictions in the real dataset with sensitivity and an F1 score of more than 0.8. Unfortunately, the synthetic data produced still has drawbacks and biases, especially in conducting statistical models. Therefore, it is essential to improve the deep generative models, especially in maintaining the statistical guarantee of the dataset.","PeriodicalId":277015,"journal":{"name":"2022 IEEE International Smart Cities Conference (ISC2)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Smart Cities Conference (ISC2)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISC255366.2022.9921948","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Two years have passed since COVID-19 broke out in Indonesia. In Indonesia, the central and regional governments have used vast amounts of data on COVID-19 patients for policymaking. However, it is clear that privacy problems can arise when people use their data. Thus, it is crucial to keep COVID-19 data private, using synthetic data publishing (SDP). One of the well-known SDP methods is by using deep generative models. This study explores the usage of deep generative models to synthesise COVID-19 individual data. The deep generative models used in this paper are Generative Adversarial Networks (GAN), Adversarial Autoencoders (AAE), and Adversarial Variational Bayes (AVB). This study found that AAE and AVB outperform GAN in loss, distribution, and privacy preservation, mainly when using the Wasserstein approach. Furthermore, the synthetic data produced predictions in the real dataset with sensitivity and an F1 score of more than 0.8. Unfortunately, the synthetic data produced still has drawbacks and biases, especially in conducting statistical models. Therefore, it is essential to improve the deep generative models, especially in maintaining the statistical guarantee of the dataset.