{"title":"Fundamental Limits of Personalized Federated Linear Regression with Data Heterogeneity","authors":"Chun-Ying Hou, I-Hsiang Wang","doi":"10.1109/ISIT50566.2022.9834894","DOIUrl":null,"url":null,"abstract":"Federated learning is a nascent framework for collaborative machine learning over networks of devices with local data and local model updates. Data heterogeneity across the devices is one of the challenges confronting this emerging field. Personalization is a natural approach to simultaneously utilize information from the other users’ data and take data heterogeneity into account. In this work, we study the linear regression problem where the data across users are generated from different regression vectors. We present an information-theoretic lower bound of the minimax expected excess risk of personalized linear models. We show an upper bound that matches the lower bound within constant factors. The results characterize the effect of data heterogeneity on learning performance and the trade-off between sample size, problem difficulty, and distribution discrepancy, suggesting that the discrepancy-to-difficulty ratio is the key factor governing the effectiveness of heterogeneous data.","PeriodicalId":348168,"journal":{"name":"2022 IEEE International Symposium on Information Theory (ISIT)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Symposium on Information Theory (ISIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISIT50566.2022.9834894","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Federated learning is a nascent framework for collaborative machine learning over networks of devices with local data and local model updates. Data heterogeneity across the devices is one of the challenges confronting this emerging field. Personalization is a natural approach to simultaneously utilize information from the other users’ data and take data heterogeneity into account. In this work, we study the linear regression problem where the data across users are generated from different regression vectors. We present an information-theoretic lower bound of the minimax expected excess risk of personalized linear models. We show an upper bound that matches the lower bound within constant factors. The results characterize the effect of data heterogeneity on learning performance and the trade-off between sample size, problem difficulty, and distribution discrepancy, suggesting that the discrepancy-to-difficulty ratio is the key factor governing the effectiveness of heterogeneous data.