FedDSS:横向联合学习中客户选择的数据相似性方法

IF 3.7 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS International Journal of Medical Informatics Pub Date : 2024-10-16 DOI:10.1016/j.ijmedinf.2024.105650
Tuong Minh Nguyen , Kim Leng Poh , Shu-Ling Chong , Jan Hau Lee
{"title":"FedDSS:横向联合学习中客户选择的数据相似性方法","authors":"Tuong Minh Nguyen ,&nbsp;Kim Leng Poh ,&nbsp;Shu-Ling Chong ,&nbsp;Jan Hau Lee","doi":"10.1016/j.ijmedinf.2024.105650","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and objective</h3><div>Federated learning (FL) is an emerging distributed learning framework allowing multiple clients (hospitals, institutions, smart devices, etc.) to collaboratively train a centralized machine learning model without disclosing personal data. It has the potential to address several healthcare challenges, including a lack of training data, data privacy, and security concerns. However, model learning under FL is affected by non-i.i.d. data, leading to severe model divergence and reduced performance due to the varying client's data distributions. To address this problem, we propose FedDSS, Federated Data Similarity Selection, a framework that uses a data-similarity approach to select clients, without compromising client data privacy.</div></div><div><h3>Methods</h3><div>FedDSS comprises a statistical-based data similarity metric, a <em>N</em>-similar-neighbor network, and a network-based selection strategy. We assessed FedDSS' performance against FedAvg's in i.i.d. and non-i.i.d. settings with two public pediatric sepsis datasets (PICD and MIMICIII). Selection fairness was measured using <span><math><mi>e</mi><mi>n</mi><mi>t</mi><mi>r</mi><mi>o</mi><mi>p</mi><mi>y</mi></math></span>. Simulations were repeated five times to evaluate average loss, true positive rate (TPR), and <span><math><mi>e</mi><mi>n</mi><mi>t</mi><mi>r</mi><mi>o</mi><mi>p</mi><mi>y</mi></math></span>.</div></div><div><h3>Results</h3><div>In i.i.d setting on PICD, FedDSS achieved a higher TPR starting from the 9th round and surpassing 0.6 three rounds earlier than FedAvg. On MIMICIII, FedDSS's loss decreases significantly from the 13th round, with TPR &gt; 0.8 by the 2nd round, two rounds ahead of FedAvg (at the 4th round). In the non-i.i.d. setting, FedDSS achieved TPR &gt; 0.7 by the 4th and &gt; 0.8 by the 7th round, earlier than FedAvg (at the 5th and 11th rounds). In both settings, FedDSS showed reasonable fairness (<span><math><mi>e</mi><mi>n</mi><mi>t</mi><mi>r</mi><mi>o</mi><mi>p</mi><mi>y</mi></math></span> of 2.2 and 2.1).</div></div><div><h3>Conclusion</h3><div>We demonstrated that FedDSS contributes to improved learning in FL by achieving faster convergence, reaching the desired TPR with fewer communication rounds, and potentially enhancing sepsis prediction (TPR) over FedAvg.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"192 ","pages":"Article 105650"},"PeriodicalIF":3.7000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FedDSS: A data-similarity approach for client selection in horizontal federated learning\",\"authors\":\"Tuong Minh Nguyen ,&nbsp;Kim Leng Poh ,&nbsp;Shu-Ling Chong ,&nbsp;Jan Hau Lee\",\"doi\":\"10.1016/j.ijmedinf.2024.105650\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background and objective</h3><div>Federated learning (FL) is an emerging distributed learning framework allowing multiple clients (hospitals, institutions, smart devices, etc.) to collaboratively train a centralized machine learning model without disclosing personal data. It has the potential to address several healthcare challenges, including a lack of training data, data privacy, and security concerns. However, model learning under FL is affected by non-i.i.d. data, leading to severe model divergence and reduced performance due to the varying client's data distributions. To address this problem, we propose FedDSS, Federated Data Similarity Selection, a framework that uses a data-similarity approach to select clients, without compromising client data privacy.</div></div><div><h3>Methods</h3><div>FedDSS comprises a statistical-based data similarity metric, a <em>N</em>-similar-neighbor network, and a network-based selection strategy. We assessed FedDSS' performance against FedAvg's in i.i.d. and non-i.i.d. settings with two public pediatric sepsis datasets (PICD and MIMICIII). Selection fairness was measured using <span><math><mi>e</mi><mi>n</mi><mi>t</mi><mi>r</mi><mi>o</mi><mi>p</mi><mi>y</mi></math></span>. Simulations were repeated five times to evaluate average loss, true positive rate (TPR), and <span><math><mi>e</mi><mi>n</mi><mi>t</mi><mi>r</mi><mi>o</mi><mi>p</mi><mi>y</mi></math></span>.</div></div><div><h3>Results</h3><div>In i.i.d setting on PICD, FedDSS achieved a higher TPR starting from the 9th round and surpassing 0.6 three rounds earlier than FedAvg. On MIMICIII, FedDSS's loss decreases significantly from the 13th round, with TPR &gt; 0.8 by the 2nd round, two rounds ahead of FedAvg (at the 4th round). In the non-i.i.d. setting, FedDSS achieved TPR &gt; 0.7 by the 4th and &gt; 0.8 by the 7th round, earlier than FedAvg (at the 5th and 11th rounds). In both settings, FedDSS showed reasonable fairness (<span><math><mi>e</mi><mi>n</mi><mi>t</mi><mi>r</mi><mi>o</mi><mi>p</mi><mi>y</mi></math></span> of 2.2 and 2.1).</div></div><div><h3>Conclusion</h3><div>We demonstrated that FedDSS contributes to improved learning in FL by achieving faster convergence, reaching the desired TPR with fewer communication rounds, and potentially enhancing sepsis prediction (TPR) over FedAvg.</div></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":\"192 \",\"pages\":\"Article 105650\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386505624003137\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505624003137","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

背景和目标联合学习(FL)是一种新兴的分布式学习框架,允许多个客户端(医院、机构、智能设备等)在不披露个人数据的情况下协作训练一个集中式机器学习模型。它有可能解决一些医疗保健难题,包括缺乏训练数据、数据隐私和安全问题。然而,FL 下的模型学习会受到非 i.i.d. 数据的影响,由于客户数据分布不同,会导致严重的模型发散和性能下降。为了解决这个问题,我们提出了 FedDSS(联合数据相似性选择),这是一个使用数据相似性方法选择客户端的框架,同时不损害客户端数据隐私。方法 FedDSS 包括一个基于统计的数据相似性度量、一个 N 个相似邻居网络和一个基于网络的选择策略。我们用两个公共儿科败血症数据集(PICD 和 MIMICIII)评估了 FedDSS 在 i.i.d. 和非 i.i.d. 设置下与 FedAvg 的性能对比。选择公平性用熵来衡量。结果在 PICD 的 i.i.d 设置中,FedDSS 从第 9 轮开始获得了更高的 TPR,并比 FedAvg 早三轮超过了 0.6。在 MIMICIII 上,FedDSS 的损失从第 13 轮开始大幅减少,到第 2 轮时 TPR 已达 0.8,比 FedAvg(第 4 轮)早两轮。在非 i.i.d. 设置中,FedDSS 在第 4 轮和第 7 轮的 TPR 分别为 0.7 和 0.8,早于 FedAvg(第 5 轮和第 11 轮)。在这两种设置中,FedDSS 都表现出了合理的公平性(熵值分别为 2.2 和 2.1)。结论我们证明,FedDSS 通过实现更快的收敛、以更少的通信轮数达到所需的 TPR,以及与 FedAvg 相比潜在地提高败血症预测(TPR),有助于改善 FL 的学习。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
FedDSS: A data-similarity approach for client selection in horizontal federated learning

Background and objective

Federated learning (FL) is an emerging distributed learning framework allowing multiple clients (hospitals, institutions, smart devices, etc.) to collaboratively train a centralized machine learning model without disclosing personal data. It has the potential to address several healthcare challenges, including a lack of training data, data privacy, and security concerns. However, model learning under FL is affected by non-i.i.d. data, leading to severe model divergence and reduced performance due to the varying client's data distributions. To address this problem, we propose FedDSS, Federated Data Similarity Selection, a framework that uses a data-similarity approach to select clients, without compromising client data privacy.

Methods

FedDSS comprises a statistical-based data similarity metric, a N-similar-neighbor network, and a network-based selection strategy. We assessed FedDSS' performance against FedAvg's in i.i.d. and non-i.i.d. settings with two public pediatric sepsis datasets (PICD and MIMICIII). Selection fairness was measured using entropy. Simulations were repeated five times to evaluate average loss, true positive rate (TPR), and entropy.

Results

In i.i.d setting on PICD, FedDSS achieved a higher TPR starting from the 9th round and surpassing 0.6 three rounds earlier than FedAvg. On MIMICIII, FedDSS's loss decreases significantly from the 13th round, with TPR > 0.8 by the 2nd round, two rounds ahead of FedAvg (at the 4th round). In the non-i.i.d. setting, FedDSS achieved TPR > 0.7 by the 4th and > 0.8 by the 7th round, earlier than FedAvg (at the 5th and 11th rounds). In both settings, FedDSS showed reasonable fairness (entropy of 2.2 and 2.1).

Conclusion

We demonstrated that FedDSS contributes to improved learning in FL by achieving faster convergence, reaching the desired TPR with fewer communication rounds, and potentially enhancing sepsis prediction (TPR) over FedAvg.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Medical Informatics
International Journal of Medical Informatics 医学-计算机:信息系统
CiteScore
8.90
自引率
4.10%
发文量
217
审稿时长
42 days
期刊介绍: International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings. The scope of journal covers: Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.; Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc. Educational computer based programs pertaining to medical informatics or medicine in general; Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.
期刊最新文献
Editorial Board Predicting abnormal C-reactive protein level for improving utilization by deep neural network model Analysis of missing data in electronic health records of people with diabetes in primary care in Spain: A population-based cohort study What information do patients pay more attention to in online physician selection? Information needs model for online medical choice decision-making based on trust theory and fuzzy decision Systematic construction of composite radiation therapy dataset using automated data pipeline for prognosis prediction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1