Exploring the Distributed Knowledge Congruence in Proxy-data-free Federated Distillation

IF 7.2 4区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE ACM Transactions on Intelligent Systems and Technology Pub Date : 2023-12-29 DOI:10.1145/3639369

Zhiyuan Wu, Sheng Sun, Yuwei Wang, Min Liu, Quyang Pan, Junbo Zhang, Zeju Li, Qingxiang Liu

{"title":"Exploring the Distributed Knowledge Congruence in Proxy-data-free Federated Distillation","authors":"Zhiyuan Wu, Sheng Sun, Yuwei Wang, Min Liu, Quyang Pan, Junbo Zhang, Zeju Li, Qingxiang Liu","doi":"10.1145/3639369","DOIUrl":null,"url":null,"abstract":"<p>Federated learning (FL) is a privacy-preserving machine learning paradigm in which the server periodically aggregates local model parameters from clients without assembling their private data. Constrained communication and personalization requirements pose severe challenges to FL. Federated distillation (FD) is proposed to simultaneously address the above two problems, which exchanges knowledge between the server and clients, supporting heterogeneous local models while significantly reducing communication overhead. However, most existing FD methods require a proxy dataset, which is often unavailable in reality. A few recent proxy-data-free FD approaches can eliminate the need for additional public data, but suffer from remarkable discrepancy among local knowledge due to client-side model heterogeneity, leading to ambiguous representation on the server and inevitable accuracy degradation. To tackle this issue, we propose a proxy-data-free FD algorithm based on distributed knowledge congruence (FedDKC). FedDKC leverages well-designed refinement strategies to narrow local knowledge differences into an acceptable upper bound, so as to mitigate the negative effects of knowledge incongruence. Specifically, from perspectives of peak probability and Shannon entropy of local knowledge, we design kernel-based knowledge refinement (KKR) and searching-based knowledge refinement (SKR) respectively, and theoretically guarantee that the refined-local knowledge can satisfy an approximately-similar distribution and be regarded as congruent. Extensive experiments conducted on three common datasets demonstrate that our proposed FedDKC significantly outperforms the state-of-the-art on various heterogeneous settings while evidently improving the convergence speed.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"45 2 1","pages":""},"PeriodicalIF":7.2000,"publicationDate":"2023-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3639369","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Federated learning (FL) is a privacy-preserving machine learning paradigm in which the server periodically aggregates local model parameters from clients without assembling their private data. Constrained communication and personalization requirements pose severe challenges to FL. Federated distillation (FD) is proposed to simultaneously address the above two problems, which exchanges knowledge between the server and clients, supporting heterogeneous local models while significantly reducing communication overhead. However, most existing FD methods require a proxy dataset, which is often unavailable in reality. A few recent proxy-data-free FD approaches can eliminate the need for additional public data, but suffer from remarkable discrepancy among local knowledge due to client-side model heterogeneity, leading to ambiguous representation on the server and inevitable accuracy degradation. To tackle this issue, we propose a proxy-data-free FD algorithm based on distributed knowledge congruence (FedDKC). FedDKC leverages well-designed refinement strategies to narrow local knowledge differences into an acceptable upper bound, so as to mitigate the negative effects of knowledge incongruence. Specifically, from perspectives of peak probability and Shannon entropy of local knowledge, we design kernel-based knowledge refinement (KKR) and searching-based knowledge refinement (SKR) respectively, and theoretically guarantee that the refined-local knowledge can satisfy an approximately-similar distribution and be regarded as congruent. Extensive experiments conducted on three common datasets demonstrate that our proposed FedDKC significantly outperforms the state-of-the-art on various heterogeneous settings while evidently improving the convergence speed.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

探索无代理数据联合蒸馏中的分布式知识一致性

联合学习（FL）是一种保护隐私的机器学习范式，在这种范式中，服务器会定期汇总来自客户端的本地模型参数，而不会汇集他们的私人数据。受限的通信和个性化要求给联合学习带来了严峻的挑战。为了同时解决上述两个问题，有人提出了联合蒸馏（Federated distillation，FD）方法，即在服务器和客户端之间交换知识，支持异构本地模型，同时显著减少通信开销。然而，现有的大多数 FD 方法都需要代理数据集，而现实中往往没有代理数据集。最近出现的几种无代理数据 FD 方法不需要额外的公共数据，但由于客户端模型异构，本地知识之间存在显著差异，导致服务器上的表示模糊不清，精度不可避免地下降。为了解决这个问题，我们提出了一种基于分布式知识一致性的无代理数据 FD 算法（FedDKC）。FedDKC 利用精心设计的细化策略将局部知识差异缩小到可接受的上限，从而减轻知识不一致带来的负面影响。具体来说，我们从局部知识的峰值概率和香农熵的角度出发，分别设计了基于内核的知识细化（KKR）和基于搜索的知识细化（SKR），并从理论上保证了细化后的局部知识能够满足近似分布并被视为一致。在三个常见数据集上进行的大量实验表明，我们提出的 FedDKC 在各种异构环境下的性能明显优于最先进的技术，同时收敛速度也明显提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Intelligent Systems and Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

9.30

自引率

2.00%

发文量

131

期刊介绍： ACM Transactions on Intelligent Systems and Technology is a scholarly journal that publishes the highest quality papers on intelligent systems, applicable algorithms and technology with a multi-disciplinary perspective. An intelligent system is one that uses artificial intelligence (AI) techniques to offer important services (e.g., as a component of a larger system) to allow integrated systems to perceive, reason, learn, and act intelligently in the real world. ACM TIST is published quarterly (six issues a year). Each issue has 8-11 regular papers, with around 20 published journal pages or 10,000 words per paper. Additional references, proofs, graphs or detailed experiment results can be submitted as a separate appendix, while excessively lengthy papers will be rejected automatically. Authors can include online-only appendices for additional content of their published papers and are encouraged to share their code and/or data with other readers.