{"title":"Linear-Regression on Packed Encrypted Data in the Two-Server Model","authors":"Adi Akavia, Hayim Shaul, Mor Weiss, Z. Yakhini","doi":"10.1145/3338469.3358942","DOIUrl":null,"url":null,"abstract":"Developing machine learning models from federated training data, containing many independent samples, is an important task that can significantly enhance the potential applicability and prediction power of learned models. Since single users, like hospitals or individual labs, typically collect data-sets that do not support accurate learning with high confidence, it is desirable to combine data from several users without compromising data privacy. In this paper, we develop a privacy-preserving solution for learning a linear regression model from data collectively contributed by several parties (\"data owners''). Our protocol is based on the protocol of Giacomelli et al. (ACNS 2018) that utilized two non colluding servers and Linearly Homomorphic Encryption (LHE) to learn regularized linear regression models. Our methods use a different LHE scheme that allows us to significantly reduce both the number and runtime of homomorphic operations, as well as the total runtime complexity. Another advantage of our protocol is that the underlying LHE scheme is based on a different (and post-quantum secure) security assumption than Giacomelli et al. Our approach leverages the Chinese Remainder Theorem, and Single Instruction Multiple Data representations, to obtain our improved performance. For a 1000 x 40 linear regression task we can learn a model in a total of 3 seconds for the homomorphic operations, compared to more than 100 seconds reported in the literature. Our approach also scales up to larger feature spaces: we implemented a system that can handle a 1000 x 100 linear regression task, investing minutes of server computing time after a more significant offline pre-processing by the data owners. We intend to incorporate our protocol and implementations into a comprehensive system that can handle secure federated learning at larger scales.","PeriodicalId":332171,"journal":{"name":"Proceedings of the 7th ACM Workshop on Encrypted Computing & Applied Homomorphic Cryptography","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th ACM Workshop on Encrypted Computing & Applied Homomorphic Cryptography","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3338469.3358942","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18
Abstract
Developing machine learning models from federated training data, containing many independent samples, is an important task that can significantly enhance the potential applicability and prediction power of learned models. Since single users, like hospitals or individual labs, typically collect data-sets that do not support accurate learning with high confidence, it is desirable to combine data from several users without compromising data privacy. In this paper, we develop a privacy-preserving solution for learning a linear regression model from data collectively contributed by several parties ("data owners''). Our protocol is based on the protocol of Giacomelli et al. (ACNS 2018) that utilized two non colluding servers and Linearly Homomorphic Encryption (LHE) to learn regularized linear regression models. Our methods use a different LHE scheme that allows us to significantly reduce both the number and runtime of homomorphic operations, as well as the total runtime complexity. Another advantage of our protocol is that the underlying LHE scheme is based on a different (and post-quantum secure) security assumption than Giacomelli et al. Our approach leverages the Chinese Remainder Theorem, and Single Instruction Multiple Data representations, to obtain our improved performance. For a 1000 x 40 linear regression task we can learn a model in a total of 3 seconds for the homomorphic operations, compared to more than 100 seconds reported in the literature. Our approach also scales up to larger feature spaces: we implemented a system that can handle a 1000 x 100 linear regression task, investing minutes of server computing time after a more significant offline pre-processing by the data owners. We intend to incorporate our protocol and implementations into a comprehensive system that can handle secure federated learning at larger scales.
从包含许多独立样本的联邦训练数据中开发机器学习模型是一项重要的任务,可以显著提高学习模型的潜在适用性和预测能力。由于单个用户(如医院或单个实验室)通常收集的数据集不支持高可信度的准确学习,因此希望在不损害数据隐私的情况下合并来自多个用户的数据。在本文中,我们开发了一种隐私保护解决方案,用于从多方(“数据所有者”)共同提供的数据中学习线性回归模型。我们的协议基于Giacomelli等人(ACNS 2018)的协议,该协议使用两个非串通服务器和线性同态加密(LHE)来学习正则化线性回归模型。我们的方法使用了一种不同的LHE方案,使我们能够显著减少同态操作的数量和运行时,以及总运行时复杂性。我们协议的另一个优点是底层LHE方案基于与Giacomelli等人不同的(和后量子安全)安全假设。我们的方法利用中国剩余定理和单指令多数据表示来获得改进的性能。对于一个1000 x 40的线性回归任务,我们可以在总共3秒内学习一个同态操作的模型,而文献中报道的时间超过100秒。我们的方法也可以扩展到更大的特征空间:我们实现了一个可以处理1000 x 100线性回归任务的系统,在数据所有者进行更重要的离线预处理之后,投入了几分钟的服务器计算时间。我们打算将我们的协议和实现合并到一个全面的系统中,以处理更大规模的安全联邦学习。