利用低成本微型计算机为二级医疗提供可扩展的联合学习解决方案：在英国医院开发和评估 COVID-19 筛查测试的隐私保护功能

IF 23.8 1区医学 Q1 MEDICAL INFORMATICS Lancet Digital Health Pub Date : 2024-01-24 DOI:10.1016/S2589-7500(23)00226-1

Andrew A S Soltan MRCP , Anshul Thakur PhD , Jenny Yang MSc , Prof Anoop Chauhan FRCP , Leon G D’Cruz PhD , Phillip Dickson BSc , Marina A Soltan MRCP , Prof David R Thickett FRCP , Prof David W Eyre DPhil , Prof Tingting Zhu DPhil , Prof David A Clifton DPhil

{"title":"利用低成本微型计算机为二级医疗提供可扩展的联合学习解决方案：在英国医院开发和评估 COVID-19 筛查测试的隐私保护功能","authors":"Andrew A S Soltan MRCP , Anshul Thakur PhD , Jenny Yang MSc , Prof Anoop Chauhan FRCP , Leon G D’Cruz PhD , Phillip Dickson BSc , Marina A Soltan MRCP , Prof David R Thickett FRCP , Prof David W Eyre DPhil , Prof Tingting Zhu DPhil , Prof David A Clifton DPhil","doi":"10.1016/S2589-7500(23)00226-1","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Multicentre training could reduce biases in medical artificial intelligence (AI); however, ethical, legal, and technical considerations can constrain the ability of hospitals to share data. Federated learning enables institutions to participate in algorithm development while retaining custody of their data but uptake in hospitals has been limited, possibly as deployment requires specialist software and technical expertise at each site. We previously developed an artificial intelligence-driven screening test for COVID-19 in emergency departments, known as CURIAL-Lab, which uses vital signs and blood tests that are routinely available within 1 h of a patient's arrival. Here we aimed to federate our COVID-19 screening test by developing an easy-to-use embedded system—which we introduce as full-stack federated learning—to train and evaluate machine learning models across four UK hospital groups without centralising patient data.</p></div><div><h3>Methods</h3><p>We supplied a Raspberry Pi 4 Model B preloaded with our federated learning software pipeline to four National Health Service (NHS) hospital groups in the UK: Oxford University Hospitals NHS Foundation Trust (OUH; through the locally linked research University, University of Oxford), University Hospitals Birmingham NHS Foundation Trust (UHB), Bedfordshire Hospitals NHS Foundation Trust (BH), and Portsmouth Hospitals University NHS Trust (PUH). OUH, PUH, and UHB participated in federated training, training a deep neural network and logistic regressor over 150 rounds to form and calibrate a global model to predict COVID-19 status, using clinical data from patients admitted before the pandemic (COVID-19-negative) and testing positive for COVID-19 during the first wave of the pandemic. We conducted a federated evaluation of the global model for admissions during the second wave of the pandemic at OUH, PUH, and externally at BH. For OUH and PUH, we additionally performed local fine-tuning of the global model using the sites’ individual training data, forming a site-tuned model, and evaluated the resultant model for admissions during the second wave of the pandemic. This study included data collected between Dec 1, 2018, and March 1, 2021; the exact date ranges used varied by site. The primary outcome was overall model performance, measured as the area under the receiver operating characteristic curve (AUROC). Removable micro secure digital (microSD) storage was destroyed on study completion.</p></div><div><h3>Findings</h3><p>Clinical data from 130 941 patients (1772 COVID-19-positive), routinely collected across three hospital groups (OUH, PUH, and UHB), were included in federated training. The evaluation step included data from 32 986 patients (3549 COVID-19-positive) attending OUH, PUH, or BH during the second wave of the pandemic. Federated training of a global deep neural network classifier improved upon performance of models trained locally in terms of AUROC by a mean of 27·6% (SD 2·2): AUROC increased from 0·574 (95% CI 0·560–0·589) at OUH and 0·622 (0·608–0·637) at PUH using the locally trained models to 0·872 (0·862–0·882) at OUH and 0·876 (0·865–0·886) at PUH using the federated global model. Performance improvement was smaller for a logistic regression model, with a mean increase in AUROC of 13·9% (0·5%). During federated external evaluation at BH, AUROC for the global deep neural network model was 0·917 (0·893–0·942), with 89·7% sensitivity (83·6–93·6) and 76·6% specificity (73·9–79·1). Site-specific tuning of the global model did not significantly improve performance (change in AUROC <0·01).</p></div><div><h3>Interpretation</h3><p>We developed an embedded system for federated learning, using microcomputing to optimise for ease of deployment. We deployed full-stack federated learning across four UK hospital groups to develop a COVID-19 screening test without centralising patient data. Federation improved model performance, and the resultant global models were generalisable. Full-stack federated learning could enable hospitals to contribute to AI development at low cost and without specialist technical expertise at each site.</p></div><div><h3>Funding</h3><p>The Wellcome Trust, University of Oxford Medical and Life Sciences Translational Fund.</p></div>","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":null,"pages":null},"PeriodicalIF":23.8000,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2589750023002261/pdfft?md5=ec0102307b58e60cd24b5090ba16c75a&pid=1-s2.0-S2589750023002261-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A scalable federated learning solution for secondary care using low-cost microcomputing: privacy-preserving development and evaluation of a COVID-19 screening test in UK hospitals\",\"authors\":\"Andrew A S Soltan MRCP , Anshul Thakur PhD , Jenny Yang MSc , Prof Anoop Chauhan FRCP , Leon G D’Cruz PhD , Phillip Dickson BSc , Marina A Soltan MRCP , Prof David R Thickett FRCP , Prof David W Eyre DPhil , Prof Tingting Zhu DPhil , Prof David A Clifton DPhil\",\"doi\":\"10.1016/S2589-7500(23)00226-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><p>Multicentre training could reduce biases in medical artificial intelligence (AI); however, ethical, legal, and technical considerations can constrain the ability of hospitals to share data. Federated learning enables institutions to participate in algorithm development while retaining custody of their data but uptake in hospitals has been limited, possibly as deployment requires specialist software and technical expertise at each site. We previously developed an artificial intelligence-driven screening test for COVID-19 in emergency departments, known as CURIAL-Lab, which uses vital signs and blood tests that are routinely available within 1 h of a patient's arrival. Here we aimed to federate our COVID-19 screening test by developing an easy-to-use embedded system—which we introduce as full-stack federated learning—to train and evaluate machine learning models across four UK hospital groups without centralising patient data.</p></div><div><h3>Methods</h3><p>We supplied a Raspberry Pi 4 Model B preloaded with our federated learning software pipeline to four National Health Service (NHS) hospital groups in the UK: Oxford University Hospitals NHS Foundation Trust (OUH; through the locally linked research University, University of Oxford), University Hospitals Birmingham NHS Foundation Trust (UHB), Bedfordshire Hospitals NHS Foundation Trust (BH), and Portsmouth Hospitals University NHS Trust (PUH). OUH, PUH, and UHB participated in federated training, training a deep neural network and logistic regressor over 150 rounds to form and calibrate a global model to predict COVID-19 status, using clinical data from patients admitted before the pandemic (COVID-19-negative) and testing positive for COVID-19 during the first wave of the pandemic. We conducted a federated evaluation of the global model for admissions during the second wave of the pandemic at OUH, PUH, and externally at BH. For OUH and PUH, we additionally performed local fine-tuning of the global model using the sites’ individual training data, forming a site-tuned model, and evaluated the resultant model for admissions during the second wave of the pandemic. This study included data collected between Dec 1, 2018, and March 1, 2021; the exact date ranges used varied by site. The primary outcome was overall model performance, measured as the area under the receiver operating characteristic curve (AUROC). Removable micro secure digital (microSD) storage was destroyed on study completion.</p></div><div><h3>Findings</h3><p>Clinical data from 130 941 patients (1772 COVID-19-positive), routinely collected across three hospital groups (OUH, PUH, and UHB), were included in federated training. The evaluation step included data from 32 986 patients (3549 COVID-19-positive) attending OUH, PUH, or BH during the second wave of the pandemic. Federated training of a global deep neural network classifier improved upon performance of models trained locally in terms of AUROC by a mean of 27·6% (SD 2·2): AUROC increased from 0·574 (95% CI 0·560–0·589) at OUH and 0·622 (0·608–0·637) at PUH using the locally trained models to 0·872 (0·862–0·882) at OUH and 0·876 (0·865–0·886) at PUH using the federated global model. Performance improvement was smaller for a logistic regression model, with a mean increase in AUROC of 13·9% (0·5%). During federated external evaluation at BH, AUROC for the global deep neural network model was 0·917 (0·893–0·942), with 89·7% sensitivity (83·6–93·6) and 76·6% specificity (73·9–79·1). Site-specific tuning of the global model did not significantly improve performance (change in AUROC <0·01).</p></div><div><h3>Interpretation</h3><p>We developed an embedded system for federated learning, using microcomputing to optimise for ease of deployment. We deployed full-stack federated learning across four UK hospital groups to develop a COVID-19 screening test without centralising patient data. Federation improved model performance, and the resultant global models were generalisable. Full-stack federated learning could enable hospitals to contribute to AI development at low cost and without specialist technical expertise at each site.</p></div><div><h3>Funding</h3><p>The Wellcome Trust, University of Oxford Medical and Life Sciences Translational Fund.</p></div>\",\"PeriodicalId\":48534,\"journal\":{\"name\":\"Lancet Digital Health\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":23.8000,\"publicationDate\":\"2024-01-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2589750023002261/pdfft?md5=ec0102307b58e60cd24b5090ba16c75a&pid=1-s2.0-S2589750023002261-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Lancet Digital Health\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2589750023002261\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lancet Digital Health","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589750023002261","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

摘要

背景多中心培训可以减少医学人工智能（AI）中的偏差；然而，伦理、法律和技术方面的考虑可能会限制医院共享数据的能力。联合学习能让各机构参与算法开发，同时保留对其数据的监管，但在医院中的应用却很有限，这可能是因为每个站点的部署都需要专业软件和技术专长。我们之前在急诊科开发了一种人工智能驱动的 COVID-19 筛查测试，称为 CURIAL-Lab，该测试使用的是患者到达医院后 1 小时内常规可用的生命体征和血液测试。在这里，我们旨在通过开发一个易于使用的嵌入式系统（我们将其称为全栈联合学习）来联合我们的COVID-19筛查测试，从而在不集中患者数据的情况下在英国的四个医院集团中训练和评估机器学习模型：牛津大学医院 NHS 基金会信托基金（OUH，通过当地的研究型大学牛津大学）、伯明翰大学医院 NHS 基金会信托基金（UHB）、贝德福德郡医院 NHS 基金会信托基金（BH）和朴茨茅斯医院大学 NHS 信托基金（PUH）。OUH、PUH 和 UHB 参与了联合训练，利用大流行前入院（COVID-19 阴性）和大流行第一波期间 COVID-19 检测阳性患者的临床数据，对深度神经网络和逻辑回归器进行了 150 轮训练，以形成并校准一个预测 COVID-19 状态的全局模型。在第二波大流行期间，我们对华侨大学附属医院、睦邻友好医院和波士顿卫生研究院的入院患者进行了全球模式联合评估。对于华侨大学附属医院和华侨大学附属协和医院，我们还利用这两家医院各自的训练数据对全局模型进行了局部微调，形成了一个经过局部微调的模型，并对该模型在第二波大流行期间的入院情况进行了评估。这项研究包括在 2018 年 12 月 1 日至 2021 年 3 月 1 日期间收集的数据；使用的确切日期范围因站点而异。主要结果是整体模型性能，以接收者操作特征曲线下面积（AUROC）来衡量。研究完成后，销毁了可移动的微型安全数字（microSD）存储器。研究结果联合培训包括了三个医院集团（华侨大学、华侨大学和华侨大学附属医院）定期收集的 130 941 名患者（1772 名 COVID-19 阳性）的临床数据。评估步骤包括第二波大流行期间在 OUH、PUH 或 BH 就诊的 32 986 名患者（3549 名 COVID-19 阳性）的数据。全局深度神经网络分类器的联合训练提高了本地训练模型的 AUROC 性能，平均提高了 27-6%（SD 2-2）：使用本地训练的模型，AUROC 从 OUH 的 0-574（95% CI 0-560-0-589）和 PUH 的 0-622（0-608-0-637）提高到 OUH 的 0-872（0-862-0-882）和 PUH 的 0-876（0-865-0-886）。逻辑回归模型的性能提高幅度较小，AUROC 的平均增幅为 13-9% (0-5%)。在 BH 进行联合外部评估期间，全局深度神经网络模型的 AUROC 为 0-917（0-893-0-942），灵敏度为 89-7%（83-6-93-6），特异度为 76-6%（73-9-79-1）。对全局模型进行特定部位的调整并未显著提高性能（AUROC 的变化为 0-01）。我们在英国四家医院集团部署了全栈式联合学习，在不集中患者数据的情况下开发了 COVID-19 筛查测试。联合学习提高了模型性能，由此产生的全局模型具有通用性。全栈式联合学习可以让医院以低成本参与人工智能开发，而无需在每个地点配备专业技术知识。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A scalable federated learning solution for secondary care using low-cost microcomputing: privacy-preserving development and evaluation of a COVID-19 screening test in UK hospitals

Background

Multicentre training could reduce biases in medical artificial intelligence (AI); however, ethical, legal, and technical considerations can constrain the ability of hospitals to share data. Federated learning enables institutions to participate in algorithm development while retaining custody of their data but uptake in hospitals has been limited, possibly as deployment requires specialist software and technical expertise at each site. We previously developed an artificial intelligence-driven screening test for COVID-19 in emergency departments, known as CURIAL-Lab, which uses vital signs and blood tests that are routinely available within 1 h of a patient's arrival. Here we aimed to federate our COVID-19 screening test by developing an easy-to-use embedded system—which we introduce as full-stack federated learning—to train and evaluate machine learning models across four UK hospital groups without centralising patient data.

Methods

We supplied a Raspberry Pi 4 Model B preloaded with our federated learning software pipeline to four National Health Service (NHS) hospital groups in the UK: Oxford University Hospitals NHS Foundation Trust (OUH; through the locally linked research University, University of Oxford), University Hospitals Birmingham NHS Foundation Trust (UHB), Bedfordshire Hospitals NHS Foundation Trust (BH), and Portsmouth Hospitals University NHS Trust (PUH). OUH, PUH, and UHB participated in federated training, training a deep neural network and logistic regressor over 150 rounds to form and calibrate a global model to predict COVID-19 status, using clinical data from patients admitted before the pandemic (COVID-19-negative) and testing positive for COVID-19 during the first wave of the pandemic. We conducted a federated evaluation of the global model for admissions during the second wave of the pandemic at OUH, PUH, and externally at BH. For OUH and PUH, we additionally performed local fine-tuning of the global model using the sites’ individual training data, forming a site-tuned model, and evaluated the resultant model for admissions during the second wave of the pandemic. This study included data collected between Dec 1, 2018, and March 1, 2021; the exact date ranges used varied by site. The primary outcome was overall model performance, measured as the area under the receiver operating characteristic curve (AUROC). Removable micro secure digital (microSD) storage was destroyed on study completion.

Findings

Clinical data from 130 941 patients (1772 COVID-19-positive), routinely collected across three hospital groups (OUH, PUH, and UHB), were included in federated training. The evaluation step included data from 32 986 patients (3549 COVID-19-positive) attending OUH, PUH, or BH during the second wave of the pandemic. Federated training of a global deep neural network classifier improved upon performance of models trained locally in terms of AUROC by a mean of 27·6% (SD 2·2): AUROC increased from 0·574 (95% CI 0·560–0·589) at OUH and 0·622 (0·608–0·637) at PUH using the locally trained models to 0·872 (0·862–0·882) at OUH and 0·876 (0·865–0·886) at PUH using the federated global model. Performance improvement was smaller for a logistic regression model, with a mean increase in AUROC of 13·9% (0·5%). During federated external evaluation at BH, AUROC for the global deep neural network model was 0·917 (0·893–0·942), with 89·7% sensitivity (83·6–93·6) and 76·6% specificity (73·9–79·1). Site-specific tuning of the global model did not significantly improve performance (change in AUROC <0·01).

Interpretation

We developed an embedded system for federated learning, using microcomputing to optimise for ease of deployment. We deployed full-stack federated learning across four UK hospital groups to develop a COVID-19 screening test without centralising patient data. Federation improved model performance, and the resultant global models were generalisable. Full-stack federated learning could enable hospitals to contribute to AI development at low cost and without specialist technical expertise at each site.

Funding

The Wellcome Trust, University of Oxford Medical and Life Sciences Translational Fund.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Lancet Digital Health Multiple-

CiteScore

41.20

自引率

1.60%

发文量

232

审稿时长

13 weeks

期刊介绍： The Lancet Digital Health publishes important, innovative, and practice-changing research on any topic connected with digital technology in clinical medicine, public health, and global health. The journal’s open access content crosses subject boundaries, building bridges between health professionals and researchers.By bringing together the most important advances in this multidisciplinary field,The Lancet Digital Health is the most prominent publishing venue in digital health. We publish a range of content types including Articles,Review, Comment, and Correspondence, contributing to promoting digital technologies in health practice worldwide.