Supratim Das, Mahdie Rafie, Paula Kammer, Søren T. Skou, Dorte T. Grønne, Ewa M. Roos, André Hajek, Hans-Helmut König, Md Shihab Ullaha, Niklas Probul, Jan Baumbacha, Linda Baumbach
{"title":"基于多中心调查数据的疼痛强度变化的隐私保护联合预测","authors":"Supratim Das, Mahdie Rafie, Paula Kammer, Søren T. Skou, Dorte T. Grønne, Ewa M. Roos, André Hajek, Hans-Helmut König, Md Shihab Ullaha, Niklas Probul, Jan Baumbacha, Linda Baumbach","doi":"arxiv-2409.07997","DOIUrl":null,"url":null,"abstract":"Background: Patient-reported survey data are used to train prognostic models\naimed at improving healthcare. However, such data are typically available\nmulti-centric and, for privacy reasons, cannot easily be centralized in one\ndata repository. Models trained locally are less accurate, robust, and\ngeneralizable. We present and apply privacy-preserving federated machine\nlearning techniques for prognostic model building, where local survey data\nnever leaves the legally safe harbors of the medical centers. Methods: We used\ncentralized, local, and federated learning techniques on two healthcare\ndatasets (GLA:D data from the five health regions of Denmark and international\nSHARE data of 27 countries) to predict two different health outcomes. We\ncompared linear regression, random forest regression, and random forest\nclassification models trained on local data with those trained on the entire\ndata in a centralized and in a federated fashion. Results: In GLA:D data,\nfederated linear regression (R2 0.34, RMSE 18.2) and federated random forest\nregression (R2 0.34, RMSE 18.3) models outperform their local counterparts\n(i.e., R2 0.32, RMSE 18.6, R2 0.30, RMSE 18.8) with statistical significance.\nWe also found that centralized models (R2 0.34, RMSE 18.2, R2 0.32, RMSE 18.5,\nrespectively) did not perform significantly better than the federated models.\nIn SHARE, the federated model (AC 0.78, AUROC: 0.71) and centralized model (AC\n0.84, AUROC: 0.66) perform significantly better than the local models (AC:\n0.74, AUROC: 0.69). Conclusion: Federated learning enables the training of\nprognostic models from multi-center surveys without compromising privacy and\nwith only minimal or no compromise regarding model performance.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Privacy-preserving federated prediction of pain intensity change based on multi-center survey data\",\"authors\":\"Supratim Das, Mahdie Rafie, Paula Kammer, Søren T. Skou, Dorte T. Grønne, Ewa M. Roos, André Hajek, Hans-Helmut König, Md Shihab Ullaha, Niklas Probul, Jan Baumbacha, Linda Baumbach\",\"doi\":\"arxiv-2409.07997\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Patient-reported survey data are used to train prognostic models\\naimed at improving healthcare. However, such data are typically available\\nmulti-centric and, for privacy reasons, cannot easily be centralized in one\\ndata repository. Models trained locally are less accurate, robust, and\\ngeneralizable. We present and apply privacy-preserving federated machine\\nlearning techniques for prognostic model building, where local survey data\\nnever leaves the legally safe harbors of the medical centers. Methods: We used\\ncentralized, local, and federated learning techniques on two healthcare\\ndatasets (GLA:D data from the five health regions of Denmark and international\\nSHARE data of 27 countries) to predict two different health outcomes. We\\ncompared linear regression, random forest regression, and random forest\\nclassification models trained on local data with those trained on the entire\\ndata in a centralized and in a federated fashion. Results: In GLA:D data,\\nfederated linear regression (R2 0.34, RMSE 18.2) and federated random forest\\nregression (R2 0.34, RMSE 18.3) models outperform their local counterparts\\n(i.e., R2 0.32, RMSE 18.6, R2 0.30, RMSE 18.8) with statistical significance.\\nWe also found that centralized models (R2 0.34, RMSE 18.2, R2 0.32, RMSE 18.5,\\nrespectively) did not perform significantly better than the federated models.\\nIn SHARE, the federated model (AC 0.78, AUROC: 0.71) and centralized model (AC\\n0.84, AUROC: 0.66) perform significantly better than the local models (AC:\\n0.74, AUROC: 0.69). Conclusion: Federated learning enables the training of\\nprognostic models from multi-center surveys without compromising privacy and\\nwith only minimal or no compromise regarding model performance.\",\"PeriodicalId\":501301,\"journal\":{\"name\":\"arXiv - CS - Machine Learning\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07997\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07997","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Privacy-preserving federated prediction of pain intensity change based on multi-center survey data
Background: Patient-reported survey data are used to train prognostic models
aimed at improving healthcare. However, such data are typically available
multi-centric and, for privacy reasons, cannot easily be centralized in one
data repository. Models trained locally are less accurate, robust, and
generalizable. We present and apply privacy-preserving federated machine
learning techniques for prognostic model building, where local survey data
never leaves the legally safe harbors of the medical centers. Methods: We used
centralized, local, and federated learning techniques on two healthcare
datasets (GLA:D data from the five health regions of Denmark and international
SHARE data of 27 countries) to predict two different health outcomes. We
compared linear regression, random forest regression, and random forest
classification models trained on local data with those trained on the entire
data in a centralized and in a federated fashion. Results: In GLA:D data,
federated linear regression (R2 0.34, RMSE 18.2) and federated random forest
regression (R2 0.34, RMSE 18.3) models outperform their local counterparts
(i.e., R2 0.32, RMSE 18.6, R2 0.30, RMSE 18.8) with statistical significance.
We also found that centralized models (R2 0.34, RMSE 18.2, R2 0.32, RMSE 18.5,
respectively) did not perform significantly better than the federated models.
In SHARE, the federated model (AC 0.78, AUROC: 0.71) and centralized model (AC
0.84, AUROC: 0.66) perform significantly better than the local models (AC:
0.74, AUROC: 0.69). Conclusion: Federated learning enables the training of
prognostic models from multi-center surveys without compromising privacy and
with only minimal or no compromise regarding model performance.