A Submodularity-based Clustering Algorithm for the Information Bottleneck and Privacy Funnel

Ni Ding, P. Sadeghi
{"title":"A Submodularity-based Clustering Algorithm for the Information Bottleneck and Privacy Funnel","authors":"Ni Ding, P. Sadeghi","doi":"10.1109/ITW44776.2019.8989355","DOIUrl":null,"url":null,"abstract":"For the relevant data $S$ that nests in the observation $X$, the information bottleneck (IB) aims to encode $X$ into $\\hat {X}$ in order to maximize the extracted useful information $I(S;\\hat {X})$ with the minimum coding rate $I(X;\\hat {X})$. For the dual privacy tunnel (PF) problem where $S$ denotes the sensitive$/\\mathrm {p}\\mathrm {r}\\mathrm {i}\\mathrm {v}\\mathrm {a}\\mathrm {t}\\mathrm {e}\\wedge $ data, the goal is to minimize the privacy leakage $I(S;X)$ while maintain a certain level of utility $I(X;\\hat {X})$. For both problems, we propose an efficient iterative agglomerative clustering algorithm based on the minimization of the difference of submodular functions (IAC-MDSF). It starts with the original alphabet $\\hat {\\mathcal {X}}:= \\mathcal {X}$ and iteratively merges the elements in the current alphabet $ \\hat {\\mathcal {X}}$ that optimizes the Lagrangian function $I(S;\\hat {X})-\\lambda I(X;X)$. We prove that the best merge in each iteration of IAC-MDSF can be searched efficiently over all subsets of $\\hat {\\mathcal {X}}$ by the existing MDSF algorithms. By varying the value of the Lagrangian multiplier $\\lambda $, we obtain the experimental results on a heart disease data set in terms of the Pareto frontier: $I(S;\\hat {X}) \\mathrm {v}\\mathrm {s}. -I(X;\\hat {X})$. We show that our IAC-MDSF algorithm outperforms the existing iterative pairwise merge approaches for both PF and IB and is computationally much less complex.","PeriodicalId":214379,"journal":{"name":"2019 IEEE Information Theory Workshop (ITW)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Information Theory Workshop (ITW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITW44776.2019.8989355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

For the relevant data $S$ that nests in the observation $X$, the information bottleneck (IB) aims to encode $X$ into $\hat {X}$ in order to maximize the extracted useful information $I(S;\hat {X})$ with the minimum coding rate $I(X;\hat {X})$. For the dual privacy tunnel (PF) problem where $S$ denotes the sensitive$/\mathrm {p}\mathrm {r}\mathrm {i}\mathrm {v}\mathrm {a}\mathrm {t}\mathrm {e}\wedge $ data, the goal is to minimize the privacy leakage $I(S;X)$ while maintain a certain level of utility $I(X;\hat {X})$. For both problems, we propose an efficient iterative agglomerative clustering algorithm based on the minimization of the difference of submodular functions (IAC-MDSF). It starts with the original alphabet $\hat {\mathcal {X}}:= \mathcal {X}$ and iteratively merges the elements in the current alphabet $ \hat {\mathcal {X}}$ that optimizes the Lagrangian function $I(S;\hat {X})-\lambda I(X;X)$. We prove that the best merge in each iteration of IAC-MDSF can be searched efficiently over all subsets of $\hat {\mathcal {X}}$ by the existing MDSF algorithms. By varying the value of the Lagrangian multiplier $\lambda $, we obtain the experimental results on a heart disease data set in terms of the Pareto frontier: $I(S;\hat {X}) \mathrm {v}\mathrm {s}. -I(X;\hat {X})$. We show that our IAC-MDSF algorithm outperforms the existing iterative pairwise merge approaches for both PF and IB and is computationally much less complex.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于子模块的信息瓶颈和隐私漏斗聚类算法
对于观测$X$中嵌套的相关数据$S$,信息瓶颈(information bottleneck, IB)的目的是将$X$编码为$\hat {X}$,以最小编码率$I(X;\hat {X})$最大化提取的有用信息$I(S;\hat {X})$。对于双隐私隧道(PF)问题,其中$S$表示敏感的$/\mathrm {p}\mathrm {r}\mathrm {i}\mathrm {v}\mathrm {a}\mathrm {t}\mathrm {e}\wedge $数据,目标是在保持一定程度的实用性$I(X;\hat {X})$的同时最小化隐私泄漏$I(S;X)$。针对这两个问题,我们提出了一种基于子模函数差最小化的高效迭代聚类算法(IAC-MDSF)。它从原始字母表$\hat {\mathcal {X}}:= \mathcal {X}$开始,迭代地合并当前字母表$ \hat {\mathcal {X}}$中的元素,从而优化拉格朗日函数$I(S;\hat {X})-\lambda I(X;X)$。我们证明了现有的MDSF算法可以在$\hat {\mathcal {X}}$的所有子集上有效地搜索到IAC-MDSF每次迭代中的最佳合并。通过改变拉格朗日乘子$\lambda $的值,我们得到了关于心脏病数据集的帕累托边界的实验结果:$I(S;\hat {X}) \mathrm {v}\mathrm {s}. -I(X;\hat {X})$。我们表明,我们的IAC-MDSF算法优于现有的PF和IB迭代配对合并方法,并且计算复杂性大大降低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Increasing the Lifetime of Flash Memories Using Multi-Dimensional Graph-Based Codes Channel Coding at Low Capacity LDPC Code Design for Delayed Bit-Interleaved Coded Modulation Multi-library Coded Caching with Partial Secrecy Optimal Broadcast Rate of a Class of Two-Sender Unicast Index Coding Problems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1