FedDSS：横向联合学习中客户选择的数据相似性方法

IF 3.7 2区医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS International Journal of Medical Informatics Pub Date : 2024-10-16 DOI:10.1016/j.ijmedinf.2024.105650

Tuong Minh Nguyen , Kim Leng Poh , Shu-Ling Chong , Jan Hau Lee

{"title":"FedDSS：横向联合学习中客户选择的数据相似性方法","authors":"Tuong Minh Nguyen , Kim Leng Poh , Shu-Ling Chong , Jan Hau Lee","doi":"10.1016/j.ijmedinf.2024.105650","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and objective</h3><div>Federated learning (FL) is an emerging distributed learning framework allowing multiple clients (hospitals, institutions, smart devices, etc.) to collaboratively train a centralized machine learning model without disclosing personal data. It has the potential to address several healthcare challenges, including a lack of training data, data privacy, and security concerns. However, model learning under FL is affected by non-i.i.d. data, leading to severe model divergence and reduced performance due to the varying client's data distributions. To address this problem, we propose FedDSS, Federated Data Similarity Selection, a framework that uses a data-similarity approach to select clients, without compromising client data privacy.</div></div><div><h3>Methods</h3><div>FedDSS comprises a statistical-based data similarity metric, a <em>N</em>-similar-neighbor network, and a network-based selection strategy. We assessed FedDSS' performance against FedAvg's in i.i.d. and non-i.i.d. settings with two public pediatric sepsis datasets (PICD and MIMICIII). Selection fairness was measured using <span><math><mi>e</mi><mi>n</mi><mi>t</mi><mi>r</mi><mi>o</mi><mi>p</mi><mi>y</mi></math></span>. Simulations were repeated five times to evaluate average loss, true positive rate (TPR), and <span><math><mi>e</mi><mi>n</mi><mi>t</mi><mi>r</mi><mi>o</mi><mi>p</mi><mi>y</mi></math></span>.</div></div><div><h3>Results</h3><div>In i.i.d setting on PICD, FedDSS achieved a higher TPR starting from the 9th round and surpassing 0.6 three rounds earlier than FedAvg. On MIMICIII, FedDSS's loss decreases significantly from the 13th round, with TPR > 0.8 by the 2nd round, two rounds ahead of FedAvg (at the 4th round). In the non-i.i.d. setting, FedDSS achieved TPR > 0.7 by the 4th and > 0.8 by the 7th round, earlier than FedAvg (at the 5th and 11th rounds). In both settings, FedDSS showed reasonable fairness (<span><math><mi>e</mi><mi>n</mi><mi>t</mi><mi>r</mi><mi>o</mi><mi>p</mi><mi>y</mi></math></span> of 2.2 and 2.1).</div></div><div><h3>Conclusion</h3><div>We demonstrated that FedDSS contributes to improved learning in FL by achieving faster convergence, reaching the desired TPR with fewer communication rounds, and potentially enhancing sepsis prediction (TPR) over FedAvg.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"192 ","pages":"Article 105650"},"PeriodicalIF":3.7000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FedDSS: A data-similarity approach for client selection in horizontal federated learning\",\"authors\":\"Tuong Minh Nguyen , Kim Leng Poh , Shu-Ling Chong , Jan Hau Lee\",\"doi\":\"10.1016/j.ijmedinf.2024.105650\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background and objective</h3><div>Federated learning (FL) is an emerging distributed learning framework allowing multiple clients (hospitals, institutions, smart devices, etc.) to collaboratively train a centralized machine learning model without disclosing personal data. It has the potential to address several healthcare challenges, including a lack of training data, data privacy, and security concerns. However, model learning under FL is affected by non-i.i.d. data, leading to severe model divergence and reduced performance due to the varying client's data distributions. To address this problem, we propose FedDSS, Federated Data Similarity Selection, a framework that uses a data-similarity approach to select clients, without compromising client data privacy.</div></div><div><h3>Methods</h3><div>FedDSS comprises a statistical-based data similarity metric, a <em>N</em>-similar-neighbor network, and a network-based selection strategy. We assessed FedDSS' performance against FedAvg's in i.i.d. and non-i.i.d. settings with two public pediatric sepsis datasets (PICD and MIMICIII). Selection fairness was measured using <span><math><mi>e</mi><mi>n</mi><mi>t</mi><mi>r</mi><mi>o</mi><mi>p</mi><mi>y</mi></math></span>. Simulations were repeated five times to evaluate average loss, true positive rate (TPR), and <span><math><mi>e</mi><mi>n</mi><mi>t</mi><mi>r</mi><mi>o</mi><mi>p</mi><mi>y</mi></math></span>.</div></div><div><h3>Results</h3><div>In i.i.d setting on PICD, FedDSS achieved a higher TPR starting from the 9th round and surpassing 0.6 three rounds earlier than FedAvg. On MIMICIII, FedDSS's loss decreases significantly from the 13th round, with TPR > 0.8 by the 2nd round, two rounds ahead of FedAvg (at the 4th round). In the non-i.i.d. setting, FedDSS achieved TPR > 0.7 by the 4th and > 0.8 by the 7th round, earlier than FedAvg (at the 5th and 11th rounds). In both settings, FedDSS showed reasonable fairness (<span><math><mi>e</mi><mi>n</mi><mi>t</mi><mi>r</mi><mi>o</mi><mi>p</mi><mi>y</mi></math></span> of 2.2 and 2.1).</div></div><div><h3>Conclusion</h3><div>We demonstrated that FedDSS contributes to improved learning in FL by achieving faster convergence, reaching the desired TPR with fewer communication rounds, and potentially enhancing sepsis prediction (TPR) over FedAvg.</div></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":\"192 \",\"pages\":\"Article 105650\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386505624003137\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505624003137","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

背景和目标联合学习（FL）是一种新兴的分布式学习框架，允许多个客户端（医院、机构、智能设备等）在不披露个人数据的情况下协作训练一个集中式机器学习模型。它有可能解决一些医疗保健难题，包括缺乏训练数据、数据隐私和安全问题。然而，FL 下的模型学习会受到非 i.i.d. 数据的影响，由于客户数据分布不同，会导致严重的模型发散和性能下降。为了解决这个问题，我们提出了 FedDSS（联合数据相似性选择），这是一个使用数据相似性方法选择客户端的框架，同时不损害客户端数据隐私。方法 FedDSS 包括一个基于统计的数据相似性度量、一个 N 个相似邻居网络和一个基于网络的选择策略。我们用两个公共儿科败血症数据集（PICD 和 MIMICIII）评估了 FedDSS 在 i.i.d. 和非 i.i.d. 设置下与 FedAvg 的性能对比。选择公平性用熵来衡量。结果在 PICD 的 i.i.d 设置中，FedDSS 从第 9 轮开始获得了更高的 TPR，并比 FedAvg 早三轮超过了 0.6。在 MIMICIII 上，FedDSS 的损失从第 13 轮开始大幅减少，到第 2 轮时 TPR 已达 0.8，比 FedAvg（第 4 轮）早两轮。在非 i.i.d. 设置中，FedDSS 在第 4 轮和第 7 轮的 TPR 分别为 0.7 和 0.8，早于 FedAvg（第 5 轮和第 11 轮）。在这两种设置中，FedDSS 都表现出了合理的公平性（熵值分别为 2.2 和 2.1）。结论我们证明，FedDSS 通过实现更快的收敛、以更少的通信轮数达到所需的 TPR，以及与 FedAvg 相比潜在地提高败血症预测（TPR），有助于改善 FL 的学习。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

FedDSS: A data-similarity approach for client selection in horizontal federated learning

Background and objective

Federated learning (FL) is an emerging distributed learning framework allowing multiple clients (hospitals, institutions, smart devices, etc.) to collaboratively train a centralized machine learning model without disclosing personal data. It has the potential to address several healthcare challenges, including a lack of training data, data privacy, and security concerns. However, model learning under FL is affected by non-i.i.d. data, leading to severe model divergence and reduced performance due to the varying client's data distributions. To address this problem, we propose FedDSS, Federated Data Similarity Selection, a framework that uses a data-similarity approach to select clients, without compromising client data privacy.

Methods

FedDSS comprises a statistical-based data similarity metric, a N-similar-neighbor network, and a network-based selection strategy. We assessed FedDSS' performance against FedAvg's in i.i.d. and non-i.i.d. settings with two public pediatric sepsis datasets (PICD and MIMICIII). Selection fairness was measured using

e n t r o p y

. Simulations were repeated five times to evaluate average loss, true positive rate (TPR), and

e n t r o p y

Results

In i.i.d setting on PICD, FedDSS achieved a higher TPR starting from the 9th round and surpassing 0.6 three rounds earlier than FedAvg. On MIMICIII, FedDSS's loss decreases significantly from the 13th round, with TPR > 0.8 by the 2nd round, two rounds ahead of FedAvg (at the 4th round). In the non-i.i.d. setting, FedDSS achieved TPR > 0.7 by the 4th and > 0.8 by the 7th round, earlier than FedAvg (at the 5th and 11th rounds). In both settings, FedDSS showed reasonable fairness (

e n t r o p y

of 2.2 and 2.1).

Conclusion

We demonstrated that FedDSS contributes to improved learning in FL by achieving faster convergence, reaching the desired TPR with fewer communication rounds, and potentially enhancing sepsis prediction (TPR) over FedAvg.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Medical Informatics 医学-计算机：信息系统

CiteScore

8.90

自引率

4.10%

发文量

217

审稿时长

42 days

期刊介绍： International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings. The scope of journal covers: Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.; Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc. Educational computer based programs pertaining to medical informatics or medicine in general; Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.