Privacy preserving vertical distributed learning for health data

T. Islam, Noman Mohammed, Dima Alhadidi
{"title":"Privacy preserving vertical distributed learning for health data","authors":"T. Islam, Noman Mohammed, Dima Alhadidi","doi":"10.20517/jsss.2023.28","DOIUrl":null,"url":null,"abstract":"Federated learning has become a pivotal tool in healthcare, enabling valuable insights to be gleaned from disparate datasets held by cautious data owners concerned about data privacy. This method involves the analysis of data from diverse locations, which is subsequently aggregated and trained on a central server. Data distribution can occur vertically or horizontally in this decentralized setup. In our approach, we employ a unique vertical partition learning process, segmenting data by characteristics or columns for each record across all local sites, known as Vertical Distributed Learning or features distributed machine learning. Our collaborative learning approach utilizes Stochastic Gradient Descent to collectively learn from each local site and compute the final result on a central server. Notably, during the training phase, no raw data or model parameters are exchanged; only local prediction results are shared and aggregated. Yet, sharing local prediction results raises privacy concerns, which we mitigate by introducing noise into the local results using a Differential Privacy algorithm. This paper introduces a robust vertical distributed learning system that emphasizes user privacy for healthcare data. To assess our approach, we conducted experiments using the sensitive healthcare data in the Medical Information Mart for Intensive Care-Ⅲ dataset and the publicly available Adult dataset. Our experimental results demonstrate that our approach achieves an accuracy level similar to that of a fully centralized model, significantly surpassing training based solely on local features. Consequently, our solution offers an effective federated learning approach for healthcare, preserving data locality and privacy while efficiently harnessing vertically partitioned data.","PeriodicalId":509397,"journal":{"name":"Journal of Surveillance, Security and Safety","volume":" 44","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Surveillance, Security and Safety","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20517/jsss.2023.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Federated learning has become a pivotal tool in healthcare, enabling valuable insights to be gleaned from disparate datasets held by cautious data owners concerned about data privacy. This method involves the analysis of data from diverse locations, which is subsequently aggregated and trained on a central server. Data distribution can occur vertically or horizontally in this decentralized setup. In our approach, we employ a unique vertical partition learning process, segmenting data by characteristics or columns for each record across all local sites, known as Vertical Distributed Learning or features distributed machine learning. Our collaborative learning approach utilizes Stochastic Gradient Descent to collectively learn from each local site and compute the final result on a central server. Notably, during the training phase, no raw data or model parameters are exchanged; only local prediction results are shared and aggregated. Yet, sharing local prediction results raises privacy concerns, which we mitigate by introducing noise into the local results using a Differential Privacy algorithm. This paper introduces a robust vertical distributed learning system that emphasizes user privacy for healthcare data. To assess our approach, we conducted experiments using the sensitive healthcare data in the Medical Information Mart for Intensive Care-Ⅲ dataset and the publicly available Adult dataset. Our experimental results demonstrate that our approach achieves an accuracy level similar to that of a fully centralized model, significantly surpassing training based solely on local features. Consequently, our solution offers an effective federated learning approach for healthcare, preserving data locality and privacy while efficiently harnessing vertically partitioned data.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
健康数据的隐私保护垂直分布式学习
联合学习已成为医疗保健领域的一项重要工具,它能从数据所有者因担心数据隐私而谨慎持有的不同数据集中获取有价值的见解。这种方法涉及对来自不同地点的数据进行分析,然后在中央服务器上进行汇总和训练。在这种分散式设置中,数据分布可以是纵向的,也可以是横向的。在我们的方法中,我们采用了独特的垂直分区学习流程,对所有本地站点的每条记录按特征或列进行数据分割,这被称为垂直分布式学习或特征分布式机器学习。我们的协作学习方法利用随机梯度下降法对每个本地站点进行集体学习,并在中央服务器上计算最终结果。值得注意的是,在训练阶段,不交换原始数据或模型参数;只共享和汇总本地预测结果。然而,共享本地预测结果会引发隐私问题,我们通过使用差分隐私算法在本地结果中引入噪声来缓解这一问题。本文介绍了一种强调医疗保健数据用户隐私的稳健垂直分布式学习系统。为了评估我们的方法,我们使用重症监护医疗信息市场-Ⅲ数据集和公开的成人数据集中的敏感医疗数据进行了实验。实验结果表明,我们的方法达到了与完全集中式模型相似的准确率水平,大大超过了仅基于局部特征的训练。因此,我们的解决方案为医疗保健提供了一种有效的联合学习方法,在有效利用垂直分区数据的同时,保护了数据的本地性和隐私性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
TENNER: intrusion detection models for industrial networks based on ensemble learning Improved differential fault analysis of Grain128-AEAD A survey on wireless-communication vulnerabilities of ERTMS in the railway sector A TPRF-based pseudo-random number generator Bias and fairness in software and automation tools in digital forensics
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1