Differentially Private Stream Processing for the Semantic Web

Daniele Dell'Aglio, A. Bernstein
{"title":"Differentially Private Stream Processing for the Semantic Web","authors":"Daniele Dell'Aglio, A. Bernstein","doi":"10.1145/3366423.3380265","DOIUrl":null,"url":null,"abstract":"Data often contains sensitive information, which poses a major obstacle to publishing it. Some suggest to obfuscate the data or only releasing some data statistics. These approaches have, however, been shown to provide insufficient safeguards against de-anonymisation. Recently, differential privacy (DP), an approach that injects noise into the query answers to provide statistical privacy guarantees, has emerged as a solution to release sensitive data. This study investigates how to continuously release privacy-preserving histograms (or distributions) from online streams of sensitive data by combining DP and semantic web technologies. We focus on distributions, as they are the basis for many analytic applications. Specifically, we propose SihlQL, a query language that processes RDF streams in a privacy-preserving fashion. SihlQL builds on top of SPARQL and the w-event DP framework. We show how some peculiarities of w-event privacy constrain the expressiveness of SihlQL queries. Addressing these constraints, we propose an extension of w-event privacy that provides answers to a larger class of queries while preserving their privacy. To evaluate SihlQL, we implemented a prototype engine that compiles queries to Apache Flink topologies and studied its privacy properties using real-world data from an IPTV provider and an online e-commerce web site.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"121 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of The Web Conference 2020","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3366423.3380265","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Data often contains sensitive information, which poses a major obstacle to publishing it. Some suggest to obfuscate the data or only releasing some data statistics. These approaches have, however, been shown to provide insufficient safeguards against de-anonymisation. Recently, differential privacy (DP), an approach that injects noise into the query answers to provide statistical privacy guarantees, has emerged as a solution to release sensitive data. This study investigates how to continuously release privacy-preserving histograms (or distributions) from online streams of sensitive data by combining DP and semantic web technologies. We focus on distributions, as they are the basis for many analytic applications. Specifically, we propose SihlQL, a query language that processes RDF streams in a privacy-preserving fashion. SihlQL builds on top of SPARQL and the w-event DP framework. We show how some peculiarities of w-event privacy constrain the expressiveness of SihlQL queries. Addressing these constraints, we propose an extension of w-event privacy that provides answers to a larger class of queries while preserving their privacy. To evaluate SihlQL, we implemented a prototype engine that compiles queries to Apache Flink topologies and studied its privacy properties using real-world data from an IPTV provider and an online e-commerce web site.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
语义Web的差分私有流处理
数据通常包含敏感信息,这对发布数据构成了重大障碍。一些人建议混淆数据或只公布部分数据统计。然而,这些方法已被证明不能提供足够的防范去匿名化的措施。差分隐私(differential privacy, DP)是一种向查询答案中注入噪声以提供统计隐私保证的方法,近年来,差分隐私(differential privacy, DP)作为敏感数据泄露的解决方案应运而生。本研究探讨了如何结合DP和语义web技术,从在线敏感数据流中持续释放隐私保护直方图(或分布)。我们关注分布,因为它们是许多分析应用程序的基础。具体来说,我们提出了SihlQL,这是一种以保护隐私的方式处理RDF流的查询语言。SihlQL构建在SPARQL和w-event DP框架之上。我们将展示w-事件隐私的一些特性如何约束SihlQL查询的表达性。为了解决这些限制,我们提出了w-event隐私的扩展,在保留其隐私的同时为更大的查询类提供答案。为了评估SihlQL,我们实现了一个原型引擎,该引擎将查询编译到Apache Flink拓扑,并使用来自IPTV提供商和在线电子商务网站的真实数据研究其隐私属性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Gone, Gone, but Not Really, and Gone, But Not forgotten: A Typology of Website Recoverability Those who are left behind: A chronicle of internet access in Cuba Towards Automated Technologies in the Referencing Quality of Wikidata Companion of The Web Conference 2022, Virtual Event / Lyon, France, April 25 - 29, 2022 WWW '21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1