CADIS: Handling Cluster-skewed Non-IID Data in Federated Learning with Clustered Aggregation and Knowledge DIStilled Regularization

2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid) Pub Date : 2023-02-21 DOI:10.1109/CCGrid57682.2023.00032

Nang Hung Nguyen, Duc Long Nguyen, Trong Bang Nguyen, T. Nguyen, H. Pham, Truong Thao Nguyen, Phi-Le Nguyen

{"title":"CADIS: Handling Cluster-skewed Non-IID Data in Federated Learning with Clustered Aggregation and Knowledge DIStilled Regularization","authors":"Nang Hung Nguyen, Duc Long Nguyen, Trong Bang Nguyen, T. Nguyen, H. Pham, Truong Thao Nguyen, Phi-Le Nguyen","doi":"10.1109/CCGrid57682.2023.00032","DOIUrl":null,"url":null,"abstract":"Federated learning enables edge devices to train a global model collaboratively without exposing their data. Despite achieving outstanding advantages in computing efficiency and privacy protection, federated learning faces a significant challenge when dealing with non-IID data, i.e., data generated by clients that are typically not independent and identically distributed. In this paper, we tackle a new type of Non-IID data, called cluster-skewed non-IID, discovered in actual data sets. The cluster-skewed non-IID is a phenomenon in which clients can be grouped into clusters with similar data distributions. By performing an in-depth analysis of the behavior of a classification model's penultimate layer, we introduce a metric that quantifies the similarity between two clients' data distributions without violating their privacy. We then propose an aggregation scheme that guarantees equality between clusters. In addition, we offer a novel local training regularization based on the knowledge-distillation technique that reduces the overfitting problem at clients and dramatically boosts the training scheme's performance. We theoretically prove the superiority of the proposed aggregation over the benchmark FedAvg. Extensive experimental results on both standard public datasets and our in-house real-world dataset demonstrate that the proposed approach improves accuracy by up to 16% compared to the FedAvg algorithm.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid57682.2023.00032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Federated learning enables edge devices to train a global model collaboratively without exposing their data. Despite achieving outstanding advantages in computing efficiency and privacy protection, federated learning faces a significant challenge when dealing with non-IID data, i.e., data generated by clients that are typically not independent and identically distributed. In this paper, we tackle a new type of Non-IID data, called cluster-skewed non-IID, discovered in actual data sets. The cluster-skewed non-IID is a phenomenon in which clients can be grouped into clusters with similar data distributions. By performing an in-depth analysis of the behavior of a classification model's penultimate layer, we introduce a metric that quantifies the similarity between two clients' data distributions without violating their privacy. We then propose an aggregation scheme that guarantees equality between clusters. In addition, we offer a novel local training regularization based on the knowledge-distillation technique that reduces the overfitting problem at clients and dramatically boosts the training scheme's performance. We theoretically prove the superiority of the proposed aggregation over the benchmark FedAvg. Extensive experimental results on both standard public datasets and our in-house real-world dataset demonstrate that the proposed approach improves accuracy by up to 16% compared to the FedAvg algorithm.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CADIS:用聚类聚合和知识蒸馏正则化处理联邦学习中的聚类倾斜非iid数据

联邦学习使边缘设备能够在不暴露其数据的情况下协作训练全局模型。尽管在计算效率和隐私保护方面取得了突出的优势，但联邦学习在处理非iid数据(即通常不独立且不相同分布的客户端生成的数据)时面临着重大挑战。在本文中，我们处理了在实际数据集中发现的一种新的非iid数据，称为簇倾斜非iid。集群倾斜的非iid是一种现象，其中客户端可以被分组到具有相似数据分布的集群中。通过对分类模型倒数第二层的行为进行深入分析，我们引入了一个度量，该度量在不侵犯其隐私的情况下量化两个客户数据分布之间的相似性。然后，我们提出了一个保证集群之间相等的聚合方案。此外，我们还提出了一种基于知识蒸馏技术的局部训练正则化方法，减少了客户端的过拟合问题，显著提高了训练方案的性能。我们从理论上证明了所提出的聚合优于基准fedag。在标准公共数据集和我们内部的真实数据集上进行的大量实验结果表明，与fedag算法相比，所提出的方法的准确率提高了16%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)

自引率

0.00%

发文量