{"title":"健康数据的隐私保护垂直分布式学习","authors":"T. Islam, Noman Mohammed, Dima Alhadidi","doi":"10.20517/jsss.2023.28","DOIUrl":null,"url":null,"abstract":"Federated learning has become a pivotal tool in healthcare, enabling valuable insights to be gleaned from disparate datasets held by cautious data owners concerned about data privacy. This method involves the analysis of data from diverse locations, which is subsequently aggregated and trained on a central server. Data distribution can occur vertically or horizontally in this decentralized setup. In our approach, we employ a unique vertical partition learning process, segmenting data by characteristics or columns for each record across all local sites, known as Vertical Distributed Learning or features distributed machine learning. Our collaborative learning approach utilizes Stochastic Gradient Descent to collectively learn from each local site and compute the final result on a central server. Notably, during the training phase, no raw data or model parameters are exchanged; only local prediction results are shared and aggregated. Yet, sharing local prediction results raises privacy concerns, which we mitigate by introducing noise into the local results using a Differential Privacy algorithm. This paper introduces a robust vertical distributed learning system that emphasizes user privacy for healthcare data. To assess our approach, we conducted experiments using the sensitive healthcare data in the Medical Information Mart for Intensive Care-Ⅲ dataset and the publicly available Adult dataset. Our experimental results demonstrate that our approach achieves an accuracy level similar to that of a fully centralized model, significantly surpassing training based solely on local features. Consequently, our solution offers an effective federated learning approach for healthcare, preserving data locality and privacy while efficiently harnessing vertically partitioned data.","PeriodicalId":509397,"journal":{"name":"Journal of Surveillance, Security and Safety","volume":" 44","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Privacy preserving vertical distributed learning for health data\",\"authors\":\"T. Islam, Noman Mohammed, Dima Alhadidi\",\"doi\":\"10.20517/jsss.2023.28\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Federated learning has become a pivotal tool in healthcare, enabling valuable insights to be gleaned from disparate datasets held by cautious data owners concerned about data privacy. This method involves the analysis of data from diverse locations, which is subsequently aggregated and trained on a central server. Data distribution can occur vertically or horizontally in this decentralized setup. In our approach, we employ a unique vertical partition learning process, segmenting data by characteristics or columns for each record across all local sites, known as Vertical Distributed Learning or features distributed machine learning. Our collaborative learning approach utilizes Stochastic Gradient Descent to collectively learn from each local site and compute the final result on a central server. Notably, during the training phase, no raw data or model parameters are exchanged; only local prediction results are shared and aggregated. Yet, sharing local prediction results raises privacy concerns, which we mitigate by introducing noise into the local results using a Differential Privacy algorithm. This paper introduces a robust vertical distributed learning system that emphasizes user privacy for healthcare data. To assess our approach, we conducted experiments using the sensitive healthcare data in the Medical Information Mart for Intensive Care-Ⅲ dataset and the publicly available Adult dataset. Our experimental results demonstrate that our approach achieves an accuracy level similar to that of a fully centralized model, significantly surpassing training based solely on local features. Consequently, our solution offers an effective federated learning approach for healthcare, preserving data locality and privacy while efficiently harnessing vertically partitioned data.\",\"PeriodicalId\":509397,\"journal\":{\"name\":\"Journal of Surveillance, Security and Safety\",\"volume\":\" 44\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Surveillance, Security and Safety\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.20517/jsss.2023.28\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Surveillance, Security and Safety","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20517/jsss.2023.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Privacy preserving vertical distributed learning for health data
Federated learning has become a pivotal tool in healthcare, enabling valuable insights to be gleaned from disparate datasets held by cautious data owners concerned about data privacy. This method involves the analysis of data from diverse locations, which is subsequently aggregated and trained on a central server. Data distribution can occur vertically or horizontally in this decentralized setup. In our approach, we employ a unique vertical partition learning process, segmenting data by characteristics or columns for each record across all local sites, known as Vertical Distributed Learning or features distributed machine learning. Our collaborative learning approach utilizes Stochastic Gradient Descent to collectively learn from each local site and compute the final result on a central server. Notably, during the training phase, no raw data or model parameters are exchanged; only local prediction results are shared and aggregated. Yet, sharing local prediction results raises privacy concerns, which we mitigate by introducing noise into the local results using a Differential Privacy algorithm. This paper introduces a robust vertical distributed learning system that emphasizes user privacy for healthcare data. To assess our approach, we conducted experiments using the sensitive healthcare data in the Medical Information Mart for Intensive Care-Ⅲ dataset and the publicly available Adult dataset. Our experimental results demonstrate that our approach achieves an accuracy level similar to that of a fully centralized model, significantly surpassing training based solely on local features. Consequently, our solution offers an effective federated learning approach for healthcare, preserving data locality and privacy while efficiently harnessing vertically partitioned data.