The increasing need for privacy-preserving machine learning has rendered centralized data collection progressively unfeasible. To solve this, Federated Learning (FL) has emerged as a distributed learning paradigm in which multiple clients collectively train a shared global model while keeping all data locally, ensuring that no private data is sent over the network. However, FL is often hindered by statistical heterogeneity, where clients’ data are non-independent and identically distributed (non-iid), resulting in biased local updates and reduced global model performance. To overcome these key challenges, this study proposes FedEMMD, a novel method to enhance model performance under heterogeneous data. First, entropy-based data selection is used to identify and select high-quality data with a lower degree of non-iidness. Second, Maximum Mean Discrepancy (MMD) is used to calculate the divergence between local updates and the global model, guaranteeing that only stable and consistent updates are aggregated on the global model. Experiments have been conducted in two heterogeneous settings (non-iid and long-tailed distribution), using CIFAR-10 and CIFAR-10-LT. Additionally, we conduct experiments with centralized Machine Learning (ML) under the same settings to establish a baseline to evaluate the effect of data heterogeneity on centralized ML. The experimental results demonstrate that FedEMMD outperforms state-of-the-art algorithms such as FedAvg, FedProx, Scaffold, and FedOpt in terms of accuracy and convergence speed in both non-iid and long-tailed scenarios, thereby improving robustness and performance under heterogeneous settings.
扫码关注我们
求助内容:
应助结果提醒方式:
