健康数据的隐私保护垂直分布式学习

Journal of Surveillance, Security and Safety Pub Date : 2024-01-01 DOI:10.20517/jsss.2023.28

T. Islam, Noman Mohammed, Dima Alhadidi

{"title":"健康数据的隐私保护垂直分布式学习","authors":"T. Islam, Noman Mohammed, Dima Alhadidi","doi":"10.20517/jsss.2023.28","DOIUrl":null,"url":null,"abstract":"Federated learning has become a pivotal tool in healthcare, enabling valuable insights to be gleaned from disparate datasets held by cautious data owners concerned about data privacy. This method involves the analysis of data from diverse locations, which is subsequently aggregated and trained on a central server. Data distribution can occur vertically or horizontally in this decentralized setup. In our approach, we employ a unique vertical partition learning process, segmenting data by characteristics or columns for each record across all local sites, known as Vertical Distributed Learning or features distributed machine learning. Our collaborative learning approach utilizes Stochastic Gradient Descent to collectively learn from each local site and compute the final result on a central server. Notably, during the training phase, no raw data or model parameters are exchanged; only local prediction results are shared and aggregated. Yet, sharing local prediction results raises privacy concerns, which we mitigate by introducing noise into the local results using a Differential Privacy algorithm. This paper introduces a robust vertical distributed learning system that emphasizes user privacy for healthcare data. To assess our approach, we conducted experiments using the sensitive healthcare data in the Medical Information Mart for Intensive Care-Ⅲ dataset and the publicly available Adult dataset. Our experimental results demonstrate that our approach achieves an accuracy level similar to that of a fully centralized model, significantly surpassing training based solely on local features. Consequently, our solution offers an effective federated learning approach for healthcare, preserving data locality and privacy while efficiently harnessing vertically partitioned data.","PeriodicalId":509397,"journal":{"name":"Journal of Surveillance, Security and Safety","volume":" 44","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Privacy preserving vertical distributed learning for health data\",\"authors\":\"T. Islam, Noman Mohammed, Dima Alhadidi\",\"doi\":\"10.20517/jsss.2023.28\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Federated learning has become a pivotal tool in healthcare, enabling valuable insights to be gleaned from disparate datasets held by cautious data owners concerned about data privacy. This method involves the analysis of data from diverse locations, which is subsequently aggregated and trained on a central server. Data distribution can occur vertically or horizontally in this decentralized setup. In our approach, we employ a unique vertical partition learning process, segmenting data by characteristics or columns for each record across all local sites, known as Vertical Distributed Learning or features distributed machine learning. Our collaborative learning approach utilizes Stochastic Gradient Descent to collectively learn from each local site and compute the final result on a central server. Notably, during the training phase, no raw data or model parameters are exchanged; only local prediction results are shared and aggregated. Yet, sharing local prediction results raises privacy concerns, which we mitigate by introducing noise into the local results using a Differential Privacy algorithm. This paper introduces a robust vertical distributed learning system that emphasizes user privacy for healthcare data. To assess our approach, we conducted experiments using the sensitive healthcare data in the Medical Information Mart for Intensive Care-Ⅲ dataset and the publicly available Adult dataset. Our experimental results demonstrate that our approach achieves an accuracy level similar to that of a fully centralized model, significantly surpassing training based solely on local features. Consequently, our solution offers an effective federated learning approach for healthcare, preserving data locality and privacy while efficiently harnessing vertically partitioned data.\",\"PeriodicalId\":509397,\"journal\":{\"name\":\"Journal of Surveillance, Security and Safety\",\"volume\":\" 44\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Surveillance, Security and Safety\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.20517/jsss.2023.28\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Surveillance, Security and Safety","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20517/jsss.2023.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

联合学习已成为医疗保健领域的一项重要工具，它能从数据所有者因担心数据隐私而谨慎持有的不同数据集中获取有价值的见解。这种方法涉及对来自不同地点的数据进行分析，然后在中央服务器上进行汇总和训练。在这种分散式设置中，数据分布可以是纵向的，也可以是横向的。在我们的方法中，我们采用了独特的垂直分区学习流程，对所有本地站点的每条记录按特征或列进行数据分割，这被称为垂直分布式学习或特征分布式机器学习。我们的协作学习方法利用随机梯度下降法对每个本地站点进行集体学习，并在中央服务器上计算最终结果。值得注意的是，在训练阶段，不交换原始数据或模型参数；只共享和汇总本地预测结果。然而，共享本地预测结果会引发隐私问题，我们通过使用差分隐私算法在本地结果中引入噪声来缓解这一问题。本文介绍了一种强调医疗保健数据用户隐私的稳健垂直分布式学习系统。为了评估我们的方法，我们使用重症监护医疗信息市场-Ⅲ数据集和公开的成人数据集中的敏感医疗数据进行了实验。实验结果表明，我们的方法达到了与完全集中式模型相似的准确率水平，大大超过了仅基于局部特征的训练。因此，我们的解决方案为医疗保健提供了一种有效的联合学习方法，在有效利用垂直分区数据的同时，保护了数据的本地性和隐私性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Privacy preserving vertical distributed learning for health data

Federated learning has become a pivotal tool in healthcare, enabling valuable insights to be gleaned from disparate datasets held by cautious data owners concerned about data privacy. This method involves the analysis of data from diverse locations, which is subsequently aggregated and trained on a central server. Data distribution can occur vertically or horizontally in this decentralized setup. In our approach, we employ a unique vertical partition learning process, segmenting data by characteristics or columns for each record across all local sites, known as Vertical Distributed Learning or features distributed machine learning. Our collaborative learning approach utilizes Stochastic Gradient Descent to collectively learn from each local site and compute the final result on a central server. Notably, during the training phase, no raw data or model parameters are exchanged; only local prediction results are shared and aggregated. Yet, sharing local prediction results raises privacy concerns, which we mitigate by introducing noise into the local results using a Differential Privacy algorithm. This paper introduces a robust vertical distributed learning system that emphasizes user privacy for healthcare data. To assess our approach, we conducted experiments using the sensitive healthcare data in the Medical Information Mart for Intensive Care-Ⅲ dataset and the publicly available Adult dataset. Our experimental results demonstrate that our approach achieves an accuracy level similar to that of a fully centralized model, significantly surpassing training based solely on local features. Consequently, our solution offers an effective federated learning approach for healthcare, preserving data locality and privacy while efficiently harnessing vertically partitioned data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助