Tuong Minh Nguyen , Kim Leng Poh , Shu-Ling Chong , Jan Hau Lee
{"title":"FedDSS:横向联合学习中客户选择的数据相似性方法","authors":"Tuong Minh Nguyen , Kim Leng Poh , Shu-Ling Chong , Jan Hau Lee","doi":"10.1016/j.ijmedinf.2024.105650","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and objective</h3><div>Federated learning (FL) is an emerging distributed learning framework allowing multiple clients (hospitals, institutions, smart devices, etc.) to collaboratively train a centralized machine learning model without disclosing personal data. It has the potential to address several healthcare challenges, including a lack of training data, data privacy, and security concerns. However, model learning under FL is affected by non-i.i.d. data, leading to severe model divergence and reduced performance due to the varying client's data distributions. To address this problem, we propose FedDSS, Federated Data Similarity Selection, a framework that uses a data-similarity approach to select clients, without compromising client data privacy.</div></div><div><h3>Methods</h3><div>FedDSS comprises a statistical-based data similarity metric, a <em>N</em>-similar-neighbor network, and a network-based selection strategy. We assessed FedDSS' performance against FedAvg's in i.i.d. and non-i.i.d. settings with two public pediatric sepsis datasets (PICD and MIMICIII). Selection fairness was measured using <span><math><mi>e</mi><mi>n</mi><mi>t</mi><mi>r</mi><mi>o</mi><mi>p</mi><mi>y</mi></math></span>. Simulations were repeated five times to evaluate average loss, true positive rate (TPR), and <span><math><mi>e</mi><mi>n</mi><mi>t</mi><mi>r</mi><mi>o</mi><mi>p</mi><mi>y</mi></math></span>.</div></div><div><h3>Results</h3><div>In i.i.d setting on PICD, FedDSS achieved a higher TPR starting from the 9th round and surpassing 0.6 three rounds earlier than FedAvg. On MIMICIII, FedDSS's loss decreases significantly from the 13th round, with TPR > 0.8 by the 2nd round, two rounds ahead of FedAvg (at the 4th round). In the non-i.i.d. setting, FedDSS achieved TPR > 0.7 by the 4th and > 0.8 by the 7th round, earlier than FedAvg (at the 5th and 11th rounds). In both settings, FedDSS showed reasonable fairness (<span><math><mi>e</mi><mi>n</mi><mi>t</mi><mi>r</mi><mi>o</mi><mi>p</mi><mi>y</mi></math></span> of 2.2 and 2.1).</div></div><div><h3>Conclusion</h3><div>We demonstrated that FedDSS contributes to improved learning in FL by achieving faster convergence, reaching the desired TPR with fewer communication rounds, and potentially enhancing sepsis prediction (TPR) over FedAvg.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"192 ","pages":"Article 105650"},"PeriodicalIF":3.7000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FedDSS: A data-similarity approach for client selection in horizontal federated learning\",\"authors\":\"Tuong Minh Nguyen , Kim Leng Poh , Shu-Ling Chong , Jan Hau Lee\",\"doi\":\"10.1016/j.ijmedinf.2024.105650\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background and objective</h3><div>Federated learning (FL) is an emerging distributed learning framework allowing multiple clients (hospitals, institutions, smart devices, etc.) to collaboratively train a centralized machine learning model without disclosing personal data. It has the potential to address several healthcare challenges, including a lack of training data, data privacy, and security concerns. However, model learning under FL is affected by non-i.i.d. data, leading to severe model divergence and reduced performance due to the varying client's data distributions. To address this problem, we propose FedDSS, Federated Data Similarity Selection, a framework that uses a data-similarity approach to select clients, without compromising client data privacy.</div></div><div><h3>Methods</h3><div>FedDSS comprises a statistical-based data similarity metric, a <em>N</em>-similar-neighbor network, and a network-based selection strategy. We assessed FedDSS' performance against FedAvg's in i.i.d. and non-i.i.d. settings with two public pediatric sepsis datasets (PICD and MIMICIII). Selection fairness was measured using <span><math><mi>e</mi><mi>n</mi><mi>t</mi><mi>r</mi><mi>o</mi><mi>p</mi><mi>y</mi></math></span>. Simulations were repeated five times to evaluate average loss, true positive rate (TPR), and <span><math><mi>e</mi><mi>n</mi><mi>t</mi><mi>r</mi><mi>o</mi><mi>p</mi><mi>y</mi></math></span>.</div></div><div><h3>Results</h3><div>In i.i.d setting on PICD, FedDSS achieved a higher TPR starting from the 9th round and surpassing 0.6 three rounds earlier than FedAvg. On MIMICIII, FedDSS's loss decreases significantly from the 13th round, with TPR > 0.8 by the 2nd round, two rounds ahead of FedAvg (at the 4th round). In the non-i.i.d. setting, FedDSS achieved TPR > 0.7 by the 4th and > 0.8 by the 7th round, earlier than FedAvg (at the 5th and 11th rounds). In both settings, FedDSS showed reasonable fairness (<span><math><mi>e</mi><mi>n</mi><mi>t</mi><mi>r</mi><mi>o</mi><mi>p</mi><mi>y</mi></math></span> of 2.2 and 2.1).</div></div><div><h3>Conclusion</h3><div>We demonstrated that FedDSS contributes to improved learning in FL by achieving faster convergence, reaching the desired TPR with fewer communication rounds, and potentially enhancing sepsis prediction (TPR) over FedAvg.</div></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":\"192 \",\"pages\":\"Article 105650\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386505624003137\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505624003137","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
FedDSS: A data-similarity approach for client selection in horizontal federated learning
Background and objective
Federated learning (FL) is an emerging distributed learning framework allowing multiple clients (hospitals, institutions, smart devices, etc.) to collaboratively train a centralized machine learning model without disclosing personal data. It has the potential to address several healthcare challenges, including a lack of training data, data privacy, and security concerns. However, model learning under FL is affected by non-i.i.d. data, leading to severe model divergence and reduced performance due to the varying client's data distributions. To address this problem, we propose FedDSS, Federated Data Similarity Selection, a framework that uses a data-similarity approach to select clients, without compromising client data privacy.
Methods
FedDSS comprises a statistical-based data similarity metric, a N-similar-neighbor network, and a network-based selection strategy. We assessed FedDSS' performance against FedAvg's in i.i.d. and non-i.i.d. settings with two public pediatric sepsis datasets (PICD and MIMICIII). Selection fairness was measured using . Simulations were repeated five times to evaluate average loss, true positive rate (TPR), and .
Results
In i.i.d setting on PICD, FedDSS achieved a higher TPR starting from the 9th round and surpassing 0.6 three rounds earlier than FedAvg. On MIMICIII, FedDSS's loss decreases significantly from the 13th round, with TPR > 0.8 by the 2nd round, two rounds ahead of FedAvg (at the 4th round). In the non-i.i.d. setting, FedDSS achieved TPR > 0.7 by the 4th and > 0.8 by the 7th round, earlier than FedAvg (at the 5th and 11th rounds). In both settings, FedDSS showed reasonable fairness ( of 2.2 and 2.1).
Conclusion
We demonstrated that FedDSS contributes to improved learning in FL by achieving faster convergence, reaching the desired TPR with fewer communication rounds, and potentially enhancing sepsis prediction (TPR) over FedAvg.
期刊介绍:
International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings.
The scope of journal covers:
Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.;
Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc.
Educational computer based programs pertaining to medical informatics or medicine in general;
Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.