Machine learning using synthetic and real data: Similarity of evaluation metrics for different healthcare datasets and for different algorithms

Data Science and Knowledge Engineering for Sensing Decision Support Pub Date : 2018-07-30 DOI:10.1142/9789813273238_0160

Rachel Heyburn, R. Bond, Michaela M. Black, M. Mulvenna, J. Wallace, Debbie Rankin, Brian Cleland

引用次数: 19

Abstract

Sharing data is often a risk in terms of security and privacy especially if the data is sensitive. Algorithms can be used to generate synthetic data from an original raw dataset in order to share data that are considered more ‘privacy preserving’, and that increase the level of anonymity. In this paper, we carry out an experiment to study the validity of conducting machine learning on synthetic data. We compare the evaluation metrics produced from machine learning models that were trained using synthetic data with metrics yielded from machine learning models that were trained using the corresponding real data.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用合成数据和真实数据的机器学习:不同医疗保健数据集和不同算法的评估指标的相似性

共享数据通常在安全和隐私方面存在风险，特别是在数据敏感的情况下。算法可以用来从原始的原始数据集生成合成数据，以便共享被认为更“保护隐私”的数据，并提高匿名程度。在本文中，我们进行了一个实验来研究在合成数据上进行机器学习的有效性。我们将使用合成数据训练的机器学习模型产生的评估指标与使用相应真实数据训练的机器学习模型产生的指标进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Data Science and Knowledge Engineering for Sensing Decision Support

自引率

0.00%

发文量