Privacy-Preserving Machine Learning Algorithms for Big Data Systems

Kaihe Xu, Hao Yue, Linke Guo, Yuanxiong Guo, Yuguang Fang
{"title":"Privacy-Preserving Machine Learning Algorithms for Big Data Systems","authors":"Kaihe Xu, Hao Yue, Linke Guo, Yuanxiong Guo, Yuguang Fang","doi":"10.1109/ICDCS.2015.40","DOIUrl":null,"url":null,"abstract":"Machine learning has played an increasing important role in big data systems due to its capability of efficiently discovering valuable knowledge and hidden information. Often times big data such as healthcare systems or financial systems may involve with multiple organizations who may have different privacy policy, and may not explicitly share their data publicly while joint data processing may be a must. Thus, how to share big data among distributed data processing entities while mitigating privacy concerns becomes a challenging problem. Traditional methods rely on cryptographic tools and/or randomization to preserve privacy. Unfortunately, this alone may be inadequate for the emerging big data systems because they are mainly designed for traditional small-scale data sets. In this paper, we propose a novel framework to achieve privacy-preserving machine learning where the training data are distributed and each shared data portion is of large volume. Specifically, we utilize the data locality property of Apache Hadoop architecture and only a limited number of cryptographic operations at the Reduce() procedures to achieve privacy-preservation. We show that the proposed scheme is secure in the semi-honest model and use extensive simulations to demonstrate its scalability and correctness.","PeriodicalId":129182,"journal":{"name":"2015 IEEE 35th International Conference on Distributed Computing Systems","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"76","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 35th International Conference on Distributed Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS.2015.40","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 76

Abstract

Machine learning has played an increasing important role in big data systems due to its capability of efficiently discovering valuable knowledge and hidden information. Often times big data such as healthcare systems or financial systems may involve with multiple organizations who may have different privacy policy, and may not explicitly share their data publicly while joint data processing may be a must. Thus, how to share big data among distributed data processing entities while mitigating privacy concerns becomes a challenging problem. Traditional methods rely on cryptographic tools and/or randomization to preserve privacy. Unfortunately, this alone may be inadequate for the emerging big data systems because they are mainly designed for traditional small-scale data sets. In this paper, we propose a novel framework to achieve privacy-preserving machine learning where the training data are distributed and each shared data portion is of large volume. Specifically, we utilize the data locality property of Apache Hadoop architecture and only a limited number of cryptographic operations at the Reduce() procedures to achieve privacy-preservation. We show that the proposed scheme is secure in the semi-honest model and use extensive simulations to demonstrate its scalability and correctness.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
大数据系统中保护隐私的机器学习算法
机器学习在大数据系统中发挥着越来越重要的作用,因为它能够有效地发现有价值的知识和隐藏的信息。通常情况下,医疗保健系统或金融系统等大数据可能涉及多个组织,这些组织可能具有不同的隐私政策,并且可能不会明确地公开共享其数据,而联合数据处理可能是必须的。因此,如何在分布式数据处理实体之间共享大数据,同时减轻隐私问题成为一个具有挑战性的问题。传统的方法依赖于加密工具和/或随机化来保护隐私。不幸的是,对于新兴的大数据系统来说,仅靠这一点可能是不够的,因为它们主要是为传统的小规模数据集设计的。在本文中,我们提出了一个新的框架来实现保护隐私的机器学习,其中训练数据是分布式的,每个共享数据部分都是大容量的。具体来说,我们利用了Apache Hadoop架构的数据局域性属性,并且在Reduce()过程中只进行了有限数量的加密操作来实现隐私保护。我们证明了该方案在半诚实模型下是安全的,并通过大量的仿真来证明其可扩展性和正确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
FLOWPROPHET: Generic and Accurate Traffic Prediction for Data-Parallel Cluster Computing Improving the Energy Benefit for 802.3az Using Dynamic Coalescing Techniques Systematic Mining of Associated Server Herds for Malware Campaign Discovery Rain Bar: Robust Application-Driven Visual Communication Using Color Barcodes Optimizing Roadside Advertisement Dissemination in Vehicular Cyber-Physical Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1