基于多中心调查数据的疼痛强度变化的隐私保护联合预测

Supratim Das, Mahdie Rafie, Paula Kammer, Søren T. Skou, Dorte T. Grønne, Ewa M. Roos, André Hajek, Hans-Helmut König, Md Shihab Ullaha, Niklas Probul, Jan Baumbacha, Linda Baumbach
{"title":"基于多中心调查数据的疼痛强度变化的隐私保护联合预测","authors":"Supratim Das, Mahdie Rafie, Paula Kammer, Søren T. Skou, Dorte T. Grønne, Ewa M. Roos, André Hajek, Hans-Helmut König, Md Shihab Ullaha, Niklas Probul, Jan Baumbacha, Linda Baumbach","doi":"arxiv-2409.07997","DOIUrl":null,"url":null,"abstract":"Background: Patient-reported survey data are used to train prognostic models\naimed at improving healthcare. However, such data are typically available\nmulti-centric and, for privacy reasons, cannot easily be centralized in one\ndata repository. Models trained locally are less accurate, robust, and\ngeneralizable. We present and apply privacy-preserving federated machine\nlearning techniques for prognostic model building, where local survey data\nnever leaves the legally safe harbors of the medical centers. Methods: We used\ncentralized, local, and federated learning techniques on two healthcare\ndatasets (GLA:D data from the five health regions of Denmark and international\nSHARE data of 27 countries) to predict two different health outcomes. We\ncompared linear regression, random forest regression, and random forest\nclassification models trained on local data with those trained on the entire\ndata in a centralized and in a federated fashion. Results: In GLA:D data,\nfederated linear regression (R2 0.34, RMSE 18.2) and federated random forest\nregression (R2 0.34, RMSE 18.3) models outperform their local counterparts\n(i.e., R2 0.32, RMSE 18.6, R2 0.30, RMSE 18.8) with statistical significance.\nWe also found that centralized models (R2 0.34, RMSE 18.2, R2 0.32, RMSE 18.5,\nrespectively) did not perform significantly better than the federated models.\nIn SHARE, the federated model (AC 0.78, AUROC: 0.71) and centralized model (AC\n0.84, AUROC: 0.66) perform significantly better than the local models (AC:\n0.74, AUROC: 0.69). Conclusion: Federated learning enables the training of\nprognostic models from multi-center surveys without compromising privacy and\nwith only minimal or no compromise regarding model performance.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Privacy-preserving federated prediction of pain intensity change based on multi-center survey data\",\"authors\":\"Supratim Das, Mahdie Rafie, Paula Kammer, Søren T. Skou, Dorte T. Grønne, Ewa M. Roos, André Hajek, Hans-Helmut König, Md Shihab Ullaha, Niklas Probul, Jan Baumbacha, Linda Baumbach\",\"doi\":\"arxiv-2409.07997\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Patient-reported survey data are used to train prognostic models\\naimed at improving healthcare. However, such data are typically available\\nmulti-centric and, for privacy reasons, cannot easily be centralized in one\\ndata repository. Models trained locally are less accurate, robust, and\\ngeneralizable. We present and apply privacy-preserving federated machine\\nlearning techniques for prognostic model building, where local survey data\\nnever leaves the legally safe harbors of the medical centers. Methods: We used\\ncentralized, local, and federated learning techniques on two healthcare\\ndatasets (GLA:D data from the five health regions of Denmark and international\\nSHARE data of 27 countries) to predict two different health outcomes. We\\ncompared linear regression, random forest regression, and random forest\\nclassification models trained on local data with those trained on the entire\\ndata in a centralized and in a federated fashion. Results: In GLA:D data,\\nfederated linear regression (R2 0.34, RMSE 18.2) and federated random forest\\nregression (R2 0.34, RMSE 18.3) models outperform their local counterparts\\n(i.e., R2 0.32, RMSE 18.6, R2 0.30, RMSE 18.8) with statistical significance.\\nWe also found that centralized models (R2 0.34, RMSE 18.2, R2 0.32, RMSE 18.5,\\nrespectively) did not perform significantly better than the federated models.\\nIn SHARE, the federated model (AC 0.78, AUROC: 0.71) and centralized model (AC\\n0.84, AUROC: 0.66) perform significantly better than the local models (AC:\\n0.74, AUROC: 0.69). Conclusion: Federated learning enables the training of\\nprognostic models from multi-center surveys without compromising privacy and\\nwith only minimal or no compromise regarding model performance.\",\"PeriodicalId\":501301,\"journal\":{\"name\":\"arXiv - CS - Machine Learning\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07997\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07997","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景:患者报告的调查数据被用来训练预后模型,以改善医疗保健。然而,此类数据通常是多中心提供的,出于隐私原因,无法轻易集中到一个数据存储库中。本地训练的模型准确性、鲁棒性和通用性都较差。我们提出并应用了保护隐私的联合机器学习技术来构建预后模型,其中本地调查数据永远不会离开医疗中心的合法安全港。方法:我们在两个健康数据集(来自丹麦五个健康地区的 GLA:D 数据和来自 27 个国家的国际医疗保健数据)上使用了集中、本地和联合学习技术来预测两种不同的健康结果。我们比较了在本地数据上训练的线性回归模型、随机森林回归模型和随机森林分类模型,以及以集中和联合方式在初始数据上训练的模型。结果显示在 GLA:D 数据中,联合线性回归模型(R2 0.34,RMSE 18.2)和联合随机森林回归模型(R2 0.34,RMSE 18.3)优于其本地对应模型(即:R2 0.32,RMSE 18.3)、我们还发现,集中模型(分别为 R2 0.34、RMSE 18.2、R2 0.32、RMSE 18.5)的表现并没有明显优于联合模型。在 SHARE 中,联合模型(AC 0.78,AUROC:0.71)和集中模型(AC 0.84,AUROC:0.66)的表现明显优于本地模型(AC:0.74,AUROC:0.69)。结论联合学习能在不损害隐私的情况下从多中心调查中训练预测模型,而且模型的性能只受到最低程度的影响,甚至没有受到任何影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Privacy-preserving federated prediction of pain intensity change based on multi-center survey data
Background: Patient-reported survey data are used to train prognostic models aimed at improving healthcare. However, such data are typically available multi-centric and, for privacy reasons, cannot easily be centralized in one data repository. Models trained locally are less accurate, robust, and generalizable. We present and apply privacy-preserving federated machine learning techniques for prognostic model building, where local survey data never leaves the legally safe harbors of the medical centers. Methods: We used centralized, local, and federated learning techniques on two healthcare datasets (GLA:D data from the five health regions of Denmark and international SHARE data of 27 countries) to predict two different health outcomes. We compared linear regression, random forest regression, and random forest classification models trained on local data with those trained on the entire data in a centralized and in a federated fashion. Results: In GLA:D data, federated linear regression (R2 0.34, RMSE 18.2) and federated random forest regression (R2 0.34, RMSE 18.3) models outperform their local counterparts (i.e., R2 0.32, RMSE 18.6, R2 0.30, RMSE 18.8) with statistical significance. We also found that centralized models (R2 0.34, RMSE 18.2, R2 0.32, RMSE 18.5, respectively) did not perform significantly better than the federated models. In SHARE, the federated model (AC 0.78, AUROC: 0.71) and centralized model (AC 0.84, AUROC: 0.66) perform significantly better than the local models (AC: 0.74, AUROC: 0.69). Conclusion: Federated learning enables the training of prognostic models from multi-center surveys without compromising privacy and with only minimal or no compromise regarding model performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features The Impact of Element Ordering on LM Agent Performance Towards Interpretable End-Stage Renal Disease (ESRD) Prediction: Utilizing Administrative Claims Data with Explainable AI Techniques Extended Deep Submodular Functions Symmetry-Enriched Learning: A Category-Theoretic Framework for Robust Machine Learning Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1