基于查询的医疗数据隶属推理攻击评价

Proceedings of the 2023 ACM Southeast Conference Pub Date : 2023-04-12 DOI:10.1145/3564746.3587027

Lakshmi Prasanna Pedarla, Xinyue Zhang, Liang Zhao, Hafiz Khan

{"title":"基于查询的医疗数据隶属推理攻击评价","authors":"Lakshmi Prasanna Pedarla, Xinyue Zhang, Liang Zhao, Hafiz Khan","doi":"10.1145/3564746.3587027","DOIUrl":null,"url":null,"abstract":"In recent years, machine learning (ML) has achieved huge success in healthcare and medicine areas. However, recent work has demonstrated that ML is vulnerable to privacy leakage since it exhibits to overfit the training datasets. Especially, in healthcare and medical communities, there are concerns that medical images and electronic health records containing protected health information (PHI) are vulnerable to inference attacks. These PHI might be unwittingly leaked when the aforementioned data is used for training ML models to address necessary healthcare concerns. Given access to the trained ML model, the attacker is able to adopt membership inference attacks (MIA) to determine whether a specific data sample is used in the corresponding medical training dataset. In this paper, we concentrate on MIA and propose a new method to determine whether a sample was used to train the given ML model or not. Our method is based on the observation that a trained machine learning model usually is lesser sensitive to the feature value perturbations on its training samples compared with the non-training samples. The key idea of our method is to perturb a training sample's feature value in the corresponding feature space and then compute the relationship between each feature's perturbation and the corresponding prediction's change as features to train the attack model. We used publicly available medical datasets such as diabetes and heartbeat categorization data to evaluate our method. Our evaluation shows that the proposed attack can perform better than the existing membership inference attack method.","PeriodicalId":322431,"journal":{"name":"Proceedings of the 2023 ACM Southeast Conference","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation of Query-Based Membership Inference Attack on the Medical Data\",\"authors\":\"Lakshmi Prasanna Pedarla, Xinyue Zhang, Liang Zhao, Hafiz Khan\",\"doi\":\"10.1145/3564746.3587027\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, machine learning (ML) has achieved huge success in healthcare and medicine areas. However, recent work has demonstrated that ML is vulnerable to privacy leakage since it exhibits to overfit the training datasets. Especially, in healthcare and medical communities, there are concerns that medical images and electronic health records containing protected health information (PHI) are vulnerable to inference attacks. These PHI might be unwittingly leaked when the aforementioned data is used for training ML models to address necessary healthcare concerns. Given access to the trained ML model, the attacker is able to adopt membership inference attacks (MIA) to determine whether a specific data sample is used in the corresponding medical training dataset. In this paper, we concentrate on MIA and propose a new method to determine whether a sample was used to train the given ML model or not. Our method is based on the observation that a trained machine learning model usually is lesser sensitive to the feature value perturbations on its training samples compared with the non-training samples. The key idea of our method is to perturb a training sample's feature value in the corresponding feature space and then compute the relationship between each feature's perturbation and the corresponding prediction's change as features to train the attack model. We used publicly available medical datasets such as diabetes and heartbeat categorization data to evaluate our method. Our evaluation shows that the proposed attack can perform better than the existing membership inference attack method.\",\"PeriodicalId\":322431,\"journal\":{\"name\":\"Proceedings of the 2023 ACM Southeast Conference\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 ACM Southeast Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3564746.3587027\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 ACM Southeast Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3564746.3587027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

近年来，机器学习(ML)在医疗保健和医学领域取得了巨大成功。然而，最近的研究表明，机器学习容易受到隐私泄露的影响，因为它表现出对训练数据集的过拟合。特别是在医疗保健和医疗界，人们担心包含受保护健康信息(PHI)的医疗图像和电子健康记录容易受到推理攻击。当上述数据用于训练ML模型以解决必要的医疗保健问题时，这些PHI可能会在不知不觉中泄露。给定对训练过的ML模型的访问权限，攻击者能够采用成员推理攻击(MIA)来确定相应的医学训练数据集中是否使用了特定的数据样本。在本文中，我们专注于MIA，并提出了一种新的方法来确定样本是否用于训练给定的ML模型。我们的方法是基于这样一种观察，即与非训练样本相比，经过训练的机器学习模型通常对其训练样本上的特征值扰动不太敏感。我们的方法的核心思想是在相应的特征空间中扰动训练样本的特征值，然后计算每个特征的扰动与相应的预测变化之间的关系作为特征来训练攻击模型。我们使用公开可用的医疗数据集，如糖尿病和心跳分类数据来评估我们的方法。我们的评估表明，所提出的攻击比现有的隶属推理攻击方法性能更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Evaluation of Query-Based Membership Inference Attack on the Medical Data

In recent years, machine learning (ML) has achieved huge success in healthcare and medicine areas. However, recent work has demonstrated that ML is vulnerable to privacy leakage since it exhibits to overfit the training datasets. Especially, in healthcare and medical communities, there are concerns that medical images and electronic health records containing protected health information (PHI) are vulnerable to inference attacks. These PHI might be unwittingly leaked when the aforementioned data is used for training ML models to address necessary healthcare concerns. Given access to the trained ML model, the attacker is able to adopt membership inference attacks (MIA) to determine whether a specific data sample is used in the corresponding medical training dataset. In this paper, we concentrate on MIA and propose a new method to determine whether a sample was used to train the given ML model or not. Our method is based on the observation that a trained machine learning model usually is lesser sensitive to the feature value perturbations on its training samples compared with the non-training samples. The key idea of our method is to perturb a training sample's feature value in the corresponding feature space and then compute the relationship between each feature's perturbation and the corresponding prediction's change as features to train the attack model. We used publicly available medical datasets such as diabetes and heartbeat categorization data to evaluate our method. Our evaluation shows that the proposed attack can perform better than the existing membership inference attack method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助