Rachel Heyburn, R. Bond, Michaela M. Black, M. Mulvenna, J. Wallace, Debbie Rankin, Brian Cleland
{"title":"Machine learning using synthetic and real data: Similarity of evaluation metrics for different healthcare datasets and for different algorithms","authors":"Rachel Heyburn, R. Bond, Michaela M. Black, M. Mulvenna, J. Wallace, Debbie Rankin, Brian Cleland","doi":"10.1142/9789813273238_0160","DOIUrl":null,"url":null,"abstract":"Sharing data is often a risk in terms of security and privacy especially if the data is sensitive. Algorithms can be used to generate synthetic data from an original raw dataset in order to share data that are considered more ‘privacy preserving’, and that increase the level of anonymity. In this paper, we carry out an experiment to study the validity of conducting machine learning on synthetic data. We compare the evaluation metrics produced from machine learning models that were trained using synthetic data with metrics yielded from machine learning models that were trained using the corresponding real data.","PeriodicalId":259425,"journal":{"name":"Data Science and Knowledge Engineering for Sensing Decision Support","volume":"17 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Science and Knowledge Engineering for Sensing Decision Support","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9789813273238_0160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19
Abstract
Sharing data is often a risk in terms of security and privacy especially if the data is sensitive. Algorithms can be used to generate synthetic data from an original raw dataset in order to share data that are considered more ‘privacy preserving’, and that increase the level of anonymity. In this paper, we carry out an experiment to study the validity of conducting machine learning on synthetic data. We compare the evaluation metrics produced from machine learning models that were trained using synthetic data with metrics yielded from machine learning models that were trained using the corresponding real data.