Machine learning using synthetic and real data: Similarity of evaluation metrics for different healthcare datasets and for different algorithms

Rachel Heyburn, R. Bond, Michaela M. Black, M. Mulvenna, J. Wallace, Debbie Rankin, Brian Cleland
{"title":"Machine learning using synthetic and real data: Similarity of evaluation metrics for different healthcare datasets and for different algorithms","authors":"Rachel Heyburn, R. Bond, Michaela M. Black, M. Mulvenna, J. Wallace, Debbie Rankin, Brian Cleland","doi":"10.1142/9789813273238_0160","DOIUrl":null,"url":null,"abstract":"Sharing data is often a risk in terms of security and privacy especially if the data is sensitive. Algorithms can be used to generate synthetic data from an original raw dataset in order to share data that are considered more ‘privacy preserving’, and that increase the level of anonymity. In this paper, we carry out an experiment to study the validity of conducting machine learning on synthetic data. We compare the evaluation metrics produced from machine learning models that were trained using synthetic data with metrics yielded from machine learning models that were trained using the corresponding real data.","PeriodicalId":259425,"journal":{"name":"Data Science and Knowledge Engineering for Sensing Decision Support","volume":"17 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Science and Knowledge Engineering for Sensing Decision Support","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9789813273238_0160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

Abstract

Sharing data is often a risk in terms of security and privacy especially if the data is sensitive. Algorithms can be used to generate synthetic data from an original raw dataset in order to share data that are considered more ‘privacy preserving’, and that increase the level of anonymity. In this paper, we carry out an experiment to study the validity of conducting machine learning on synthetic data. We compare the evaluation metrics produced from machine learning models that were trained using synthetic data with metrics yielded from machine learning models that were trained using the corresponding real data.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用合成数据和真实数据的机器学习:不同医疗保健数据集和不同算法的评估指标的相似性
共享数据通常在安全和隐私方面存在风险,特别是在数据敏感的情况下。算法可以用来从原始的原始数据集生成合成数据,以便共享被认为更“保护隐私”的数据,并提高匿名程度。在本文中,我们进行了一个实验来研究在合成数据上进行机器学习的有效性。我们将使用合成数据训练的机器学习模型产生的评估指标与使用相应真实数据训练的机器学习模型产生的指标进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Machine learning using synthetic and real data: Similarity of evaluation metrics for different healthcare datasets and for different algorithms Distinctive features of the contradiction separation based dynamic automated deduction An optimization-based approach to aggregating multi-granular hesitant fuzzy linguistic term sets Restricted multi-pruning of decision trees Look-ahead clause selection strategy for contradiction separation based automated deduction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1