Distributed Differentially Private Mutual Information Ranking and Its Applications

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2... Pub Date : 2020-08-01 DOI:10.1109/IRI49571.2020.00021

Ankit Srivastava, Samira Pouyanfar, Joshua Allen, Ken Johnston, Qida Ma

{"title":"Distributed Differentially Private Mutual Information Ranking and Its Applications","authors":"Ankit Srivastava, Samira Pouyanfar, Joshua Allen, Ken Johnston, Qida Ma","doi":"10.1109/IRI49571.2020.00021","DOIUrl":null,"url":null,"abstract":"Computation of Mutual Information (MI) helps understand the amount of information shared between a pair of random variables. Automated feature selection techniques based on MI ranking are regularly used to extract information from sensitive datasets exceeding petabytes in size, over millions of features and classes. Series of one-vs-all MI computations can be cascaded to produce n-fold MI results, rapidly pinpointing informative relationships. This ability to quickly pinpoint the most informative relationships from datasets of billions of users creates privacy concerns. In this paper, we present Distributed Differentially Private Mutual Information (DDP-MI), a privacy-safe fast batch MI, across various scenarios such as feature selection, segmentation, ranking, and query expansion. This distributed implementation is protected with global model differential privacy to provide strong assurances against a wide range of privacy attacks. We also show that our DDP-MI can substantially improve the efficiency of MI calculations compared to standard implementations on a large-scale public dataset.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"4 1","pages":"90-96"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI49571.2020.00021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Computation of Mutual Information (MI) helps understand the amount of information shared between a pair of random variables. Automated feature selection techniques based on MI ranking are regularly used to extract information from sensitive datasets exceeding petabytes in size, over millions of features and classes. Series of one-vs-all MI computations can be cascaded to produce n-fold MI results, rapidly pinpointing informative relationships. This ability to quickly pinpoint the most informative relationships from datasets of billions of users creates privacy concerns. In this paper, we present Distributed Differentially Private Mutual Information (DDP-MI), a privacy-safe fast batch MI, across various scenarios such as feature selection, segmentation, ranking, and query expansion. This distributed implementation is protected with global model differential privacy to provide strong assurances against a wide range of privacy attacks. We also show that our DDP-MI can substantially improve the efficiency of MI calculations compared to standard implementations on a large-scale public dataset.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

分布式差分私有互信息排序及其应用

互信息计算(MI)有助于理解一对随机变量之间共享的信息量。基于MI排名的自动特征选择技术通常用于从超过pb大小的敏感数据集中提取信息，超过数百万个特征和类。一系列一对一的MI计算可以级联产生n倍的MI结果，快速确定信息关系。这种从数十亿用户的数据集中快速确定最具信息量的关系的能力会引起隐私问题。在本文中，我们提出了分布式差分私有互信息(DDP-MI)，这是一种隐私安全的快速批处理MI，适用于各种场景，如特征选择、分割、排序和查询扩展。这种分布式实现采用全局模型差分隐私保护，以提供强大的保证，防止各种隐私攻击。我们还表明，与大规模公共数据集上的标准实现相比，我们的DDP-MI可以大大提高MI计算的效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

自引率

0.00%

发文量