首页 > 最新文献

Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium最新文献

英文 中文
Private Graph Extraction via Feature Explanations 通过特征解释的私有图提取
Iyiola E. Olatunji, Mandeep Rathee, Thorben Funke, Megha Khosla
Privacy and interpretability are two important ingredients for achieving trustworthy machine learning. We study the interplay of these two aspects in graph machine learning through graph reconstruction attacks. The goal of the adversary here is to reconstruct the graph structure of the training data given access to model explanations. Based on the different kinds of auxiliary information available to the adversary, we propose several graph reconstruction attacks. We show that additional knowledge of post-hoc feature explanations substantially increases the success rate of these attacks. Further, we investigate in detail the differences between attack performance with respect to three different classes of explanation methods for graph neural networks: gradient-based, perturbation-based, and surrogate model-based methods. While gradient-based explanations reveal the most in terms of the graph structure, we find that these explanations do not always score high in utility. For the other two classes of explanations, privacy leakage increases with an increase in explanation utility. Finally, we propose a defense based on a randomized response mechanism for releasing the explanations, which substantially reduces the attack success rate. Our code is available at https://github.com/iyempissy/graph-stealing-attacks-with-explanation.
隐私和可解释性是实现可信机器学习的两个重要因素。我们通过图重建攻击来研究这两个方面在图机器学习中的相互作用。对手的目标是在给定模型解释的情况下重建训练数据的图结构。基于攻击者可获得的不同辅助信息,我们提出了几种图重构攻击。我们表明,额外的事后特征解释知识大大提高了这些攻击的成功率。此外,我们详细研究了图神经网络三种不同类型的解释方法之间的攻击性能差异:基于梯度的、基于扰动的和基于代理模型的方法。虽然基于梯度的解释在图结构方面揭示了最多,但我们发现这些解释并不总是在效用上得分很高。对于其他两类解释,隐私泄漏随着解释效用的增加而增加。最后,我们提出了一种基于随机响应机制的防御机制来发布解释,这大大降低了攻击的成功率。我们的代码可在https://github.com/iyempissy/graph-stealing-attacks-with-explanation上获得。
{"title":"Private Graph Extraction via Feature Explanations","authors":"Iyiola E. Olatunji, Mandeep Rathee, Thorben Funke, Megha Khosla","doi":"10.56553/popets-2023-0041","DOIUrl":"https://doi.org/10.56553/popets-2023-0041","url":null,"abstract":"Privacy and interpretability are two important ingredients for achieving trustworthy machine learning. We study the interplay of these two aspects in graph machine learning through graph reconstruction attacks. The goal of the adversary here is to reconstruct the graph structure of the training data given access to model explanations. Based on the different kinds of auxiliary information available to the adversary, we propose several graph reconstruction attacks. We show that additional knowledge of post-hoc feature explanations substantially increases the success rate of these attacks. Further, we investigate in detail the differences between attack performance with respect to three different classes of explanation methods for graph neural networks: gradient-based, perturbation-based, and surrogate model-based methods. While gradient-based explanations reveal the most in terms of the graph structure, we find that these explanations do not always score high in utility. For the other two classes of explanations, privacy leakage increases with an increase in explanation utility. Finally, we propose a defense based on a randomized response mechanism for releasing the explanations, which substantially reduces the attack success rate. Our code is available at https://github.com/iyempissy/graph-stealing-attacks-with-explanation.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135018474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Watching your call: Breaking VoLTE Privacy in LTE/5G Networks 看你的电话:在LTE/5G网络中打破VoLTE隐私
Zishuai Cheng, Mihai Ordean, Flavio Garcia, Baojiang Cui, Dominik Rys
Voice over LTE (VoLTE) and Voice over NR (VoNR), are two similar technologies that have been widely deployed by operators to provide a better calling experience in LTE and 5G networks, respectively. The VoLTE/NR protocols rely on the security features of the underlying LTE/5G network to protect users' privacy such that nobody can monitor calls and learn details about call times, duration, and direction. In this paper, we introduce a new privacy attack which enables adversaries to analyse encrypted LTE/5G traffic and recover any VoLTE/NR call details. We achieve this by implementing a novel mobile-relay adversary which is able to remain undetected by using an improved physical layer parameter guessing procedure. This adversary facilitates the recovery of encrypted configuration messages exchanged between victim devices and the mobile network. We further propose an identity mapping method which enables our mobile-relay adversary to link a victim's network identifiers to the phone number efficiently, requiring a single VoLTE protocol message. We evaluate the real-world performance of our attacks using four modern commercial off-the-shelf phones and two representative, commercial network carriers. We collect over 60 hours of traffic between the phones and the mobile networks and execute 160 VoLTE calls, which we use to successfully identify patterns in the physical layer parameter allocation and in VoLTE traffic, respectively. Our real-world experiments show that our mobile-relay works as expected in all test cases, and the VoLTE activity logs recovered describe the actual communication with 100% accuracy. Finally, we show that we can link network identifiers such as International Mobile Subscriber Identities (IMSI), Subscriber Concealed Identifiers (SUCI) and/or Globally Unique Temporary Identifiers (GUTI) to phone numbers while remaining undetected by the victim.
VoLTE (Voice over LTE)和VoNR (Voice over NR)是两种类似的技术,已被运营商广泛部署,分别在LTE和5G网络中提供更好的通话体验。VoLTE/NR协议依赖于底层LTE/5G网络的安全特性来保护用户的隐私,这样就没有人可以监控通话并了解通话时间、持续时间和方向的详细信息。在本文中,我们介绍了一种新的隐私攻击,使攻击者能够分析加密的LTE/5G流量并恢复任何VoLTE/NR呼叫细节。我们通过实现一种新的移动中继对手来实现这一目标,该对手能够通过使用改进的物理层参数猜测过程保持不被检测到。这种攻击有助于恢复受害者设备和移动网络之间交换的加密配置消息。我们进一步提出了一种身份映射方法,该方法使我们的移动中继攻击者能够有效地将受害者的网络标识符链接到电话号码,这需要单个VoLTE协议消息。我们使用四个现代商用现成手机和两个具有代表性的商业网络运营商来评估我们的攻击在现实世界中的性能。我们收集了电话和移动网络之间超过60小时的流量,并执行了160次VoLTE呼叫,我们使用这些呼叫分别成功地识别了物理层参数分配和VoLTE流量的模式。我们的实际实验表明,我们的移动中继在所有测试用例中都按预期工作,并且恢复的VoLTE活动日志以100%的准确率描述了实际通信。最后,我们展示了我们可以将网络标识符,如国际移动用户标识符(IMSI)、用户隐藏标识符(SUCI)和/或全球唯一临时标识符(GUTI)链接到电话号码,而不被受害者发现。
{"title":"Watching your call: Breaking VoLTE Privacy in LTE/5G Networks","authors":"Zishuai Cheng, Mihai Ordean, Flavio Garcia, Baojiang Cui, Dominik Rys","doi":"10.56553/popets-2023-0053","DOIUrl":"https://doi.org/10.56553/popets-2023-0053","url":null,"abstract":"Voice over LTE (VoLTE) and Voice over NR (VoNR), are two similar technologies that have been widely deployed by operators to provide a better calling experience in LTE and 5G networks, respectively. The VoLTE/NR protocols rely on the security features of the underlying LTE/5G network to protect users' privacy such that nobody can monitor calls and learn details about call times, duration, and direction. In this paper, we introduce a new privacy attack which enables adversaries to analyse encrypted LTE/5G traffic and recover any VoLTE/NR call details. We achieve this by implementing a novel mobile-relay adversary which is able to remain undetected by using an improved physical layer parameter guessing procedure. This adversary facilitates the recovery of encrypted configuration messages exchanged between victim devices and the mobile network. We further propose an identity mapping method which enables our mobile-relay adversary to link a victim's network identifiers to the phone number efficiently, requiring a single VoLTE protocol message. We evaluate the real-world performance of our attacks using four modern commercial off-the-shelf phones and two representative, commercial network carriers. We collect over 60 hours of traffic between the phones and the mobile networks and execute 160 VoLTE calls, which we use to successfully identify patterns in the physical layer parameter allocation and in VoLTE traffic, respectively. Our real-world experiments show that our mobile-relay works as expected in all test cases, and the VoLTE activity logs recovered describe the actual communication with 100% accuracy. Finally, we show that we can link network identifiers such as International Mobile Subscriber Identities (IMSI), Subscriber Concealed Identifiers (SUCI) and/or Globally Unique Temporary Identifiers (GUTI) to phone numbers while remaining undetected by the victim.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135018477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lessons Learned: Surveying the Practicality of Differential Privacy in the Industry 经验教训:调查行业中差异隐私的实用性
Gonzalo Munilla Garrido, Xiaoyuan Liu, Floria Matthes, Dawn Song
Since its introduction in 2006, differential privacy has emerged as a predominant statistical tool for quantifying data privacy in academic works. Yet despite the plethora of research and open-source utilities that have accompanied its rise, with limited exceptions, differential privacy has failed to achieve widespread adoption in the enterprise domain. Our study aims to shed light on the fundamental causes underlying this academic-industrial utilization gap through detailed interviews of 24 privacy practitioners across 9 major companies. We analyze the results of our survey to provide key findings and suggestions for companies striving to improve privacy protection in their data workflows and highlight the necessary and missing requirements of existing differential privacy tools, with the goal of guiding researchers working towards the broader adoption of differential privacy. Our findings indicate that analysts suffer from lengthy bureaucratic processes for requesting access to sensitive data, yet once granted, only scarcely-enforced privacy policies stand between rogue practitioners and misuse of private information. We thus argue that differential privacy can significantly improve the processes of requesting and conducting data exploration across silos, and conclude that with a few of the improvements suggested herein, the practical use of differential privacy across the enterprise is within striking distance.
自2006年推出以来,差分隐私已成为学术著作中量化数据隐私的主要统计工具。然而,尽管有大量的研究和开源实用程序伴随着它的兴起,除了有限的例外,差异隐私未能在企业领域得到广泛采用。我们的研究旨在通过对9家大公司的24位隐私从业者的详细访谈,揭示这种学术-产业利用差距的根本原因。我们对调查结果进行分析,为努力改善数据工作流程中的隐私保护的公司提供关键发现和建议,并强调现有差异隐私工具的必要和缺失要求,目的是指导研究人员朝着更广泛地采用差异隐私的方向努力。我们的研究结果表明,分析师在请求访问敏感数据时要经历漫长的官僚程序,然而,一旦获得许可,在流氓从业者和滥用私人信息之间,只有很少执行的隐私政策。因此,我们认为差异隐私可以显著改善跨孤岛请求和执行数据探索的过程,并得出结论,通过本文提出的一些改进,在整个企业中实际使用差异隐私是近在咫尺的。
{"title":"Lessons Learned: Surveying the Practicality of Differential Privacy in the Industry","authors":"Gonzalo Munilla Garrido, Xiaoyuan Liu, Floria Matthes, Dawn Song","doi":"10.56553/popets-2023-0045","DOIUrl":"https://doi.org/10.56553/popets-2023-0045","url":null,"abstract":"Since its introduction in 2006, differential privacy has emerged as a predominant statistical tool for quantifying data privacy in academic works. Yet despite the plethora of research and open-source utilities that have accompanied its rise, with limited exceptions, differential privacy has failed to achieve widespread adoption in the enterprise domain. Our study aims to shed light on the fundamental causes underlying this academic-industrial utilization gap through detailed interviews of 24 privacy practitioners across 9 major companies. We analyze the results of our survey to provide key findings and suggestions for companies striving to improve privacy protection in their data workflows and highlight the necessary and missing requirements of existing differential privacy tools, with the goal of guiding researchers working towards the broader adoption of differential privacy. Our findings indicate that analysts suffer from lengthy bureaucratic processes for requesting access to sensitive data, yet once granted, only scarcely-enforced privacy policies stand between rogue practitioners and misuse of private information. We thus argue that differential privacy can significantly improve the processes of requesting and conducting data exploration across silos, and conclude that with a few of the improvements suggested herein, the practical use of differential privacy across the enterprise is within striking distance.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136148913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Unified Framework for Quantifying Privacy Risk in Synthetic Data 综合数据隐私风险量化的统一框架
Matteo Giomi, Franziska Boenisch, Christoph Wehmeyer, Borbála Tasnádi
Synthetic data is often presented as a method for sharing sensitive information in a privacy-preserving manner by reproducing the global statistical properties of the original data without dis closing sensitive information about any individual. In practice, as with other anonymization methods, synthetic data cannot entirely eliminate privacy risks. These residual privacy risks need instead to be ex-post uncovered and assessed. However, quantifying the actual privacy risks of any synthetic dataset is a hard task, given the multitude of facets of data privacy. We present Anonymeter, a statistical framework to jointly quantify different types of privacy risks in synthetic tabular datasets. We equip this framework with attack-based evaluations for the singling out, linkability, and inference risks, which are the three key indicators of factual anonymization according to data protection regulations, such as the European General Data Protection Regulation (GDPR). To the best of our knowledge, we are the first to introduce a coherent and legally aligned evaluation of these three privacy risks for synthetic data, as well as to design privacy attacks which model directly the singling out and linkability risks. We demonstrate the effectiveness of our methods by conducting an extensive set of experiments that measure the privacy risks of data with deliberately inserted privacy leakages, and of synthetic data generated with and without differential privacy. Our results highlight that the three privacy risks reported by our framework scale linearly with the amount of privacy leakage in the data. Furthermore, we observe that synthetic data exhibits the lowest vulnerability against linkability, indicating one-to-one relationships between real and synthetic data records are not preserved. Finally, with a quantitative comparison we demonstrate that Anonymeter outperforms existing synthetic data privacy evaluation frameworks both in terms of detecting privacy leaks, as well as computation speed. To contribute to a privacy-conscious usage of synthetic data, we publish Anonymeter as an open-source library (https://github.com/statice/anonymeter).
合成数据通常是一种以保护隐私的方式共享敏感信息的方法,通过再现原始数据的全局统计属性而不泄露任何个人的敏感信息。在实践中,与其他匿名化方法一样,合成数据不能完全消除隐私风险。这些残留的隐私风险需要事后发现和评估。然而,考虑到数据隐私的众多方面,量化任何合成数据集的实际隐私风险是一项艰巨的任务。我们提出了一个统计框架,用于联合量化合成表格数据集中不同类型的隐私风险。我们为该框架配备了基于攻击的评估,用于挑选,链接和推断风险,这是根据数据保护法规(如欧洲通用数据保护条例(GDPR))的事实匿名化的三个关键指标。据我们所知,我们是第一个对合成数据的这三种隐私风险进行连贯和法律一致评估的公司,并设计了直接模拟挑出和链接风险的隐私攻击。我们通过进行一组广泛的实验来证明我们方法的有效性,这些实验测量了故意插入隐私泄漏的数据的隐私风险,以及有和没有差异隐私生成的合成数据的隐私风险。我们的研究结果表明,我们的框架报告的三种隐私风险与数据中隐私泄漏的数量呈线性关系。此外,我们观察到合成数据对链接性的脆弱性最低,这表明真实数据和合成数据记录之间没有保持一对一的关系。最后,通过定量比较,我们证明Anonymeter在检测隐私泄露和计算速度方面优于现有的综合数据隐私评估框架。为了对合成数据的隐私使用做出贡献,我们将Anonymeter作为开源库发布(https://github.com/statice/anonymeter)。
{"title":"A Unified Framework for Quantifying Privacy Risk in Synthetic Data","authors":"Matteo Giomi, Franziska Boenisch, Christoph Wehmeyer, Borbála Tasnádi","doi":"10.56553/popets-2023-0055","DOIUrl":"https://doi.org/10.56553/popets-2023-0055","url":null,"abstract":"Synthetic data is often presented as a method for sharing sensitive information in a privacy-preserving manner by reproducing the global statistical properties of the original data without dis closing sensitive information about any individual. In practice, as with other anonymization methods, synthetic data cannot entirely eliminate privacy risks. These residual privacy risks need instead to be ex-post uncovered and assessed. However, quantifying the actual privacy risks of any synthetic dataset is a hard task, given the multitude of facets of data privacy. We present Anonymeter, a statistical framework to jointly quantify different types of privacy risks in synthetic tabular datasets. We equip this framework with attack-based evaluations for the singling out, linkability, and inference risks, which are the three key indicators of factual anonymization according to data protection regulations, such as the European General Data Protection Regulation (GDPR). To the best of our knowledge, we are the first to introduce a coherent and legally aligned evaluation of these three privacy risks for synthetic data, as well as to design privacy attacks which model directly the singling out and linkability risks. We demonstrate the effectiveness of our methods by conducting an extensive set of experiments that measure the privacy risks of data with deliberately inserted privacy leakages, and of synthetic data generated with and without differential privacy. Our results highlight that the three privacy risks reported by our framework scale linearly with the amount of privacy leakage in the data. Furthermore, we observe that synthetic data exhibits the lowest vulnerability against linkability, indicating one-to-one relationships between real and synthetic data records are not preserved. Finally, with a quantitative comparison we demonstrate that Anonymeter outperforms existing synthetic data privacy evaluation frameworks both in terms of detecting privacy leaks, as well as computation speed. To contribute to a privacy-conscious usage of synthetic data, we publish Anonymeter as an open-source library (https://github.com/statice/anonymeter).","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136041647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
DeepSE-WF: Unified Security Estimation for Website Fingerprinting Defenses deep - wf:网站指纹防御的统一安全评估
Alexander Veicht, Cedric Renggli, Diogo Barradas
Website fingerprinting (WF) attacks, usually conducted with the help of a machine learning-based classifier, enable a network eavesdropper to pinpoint which website a user is accessing through the inspection of traffic patterns. These attacks have been shown to succeed even when users browse the Internet through encrypted tunnels, e.g., through Tor or VPNs. To assess the security of new defenses against WF attacks, recent works have proposed feature-dependent theoretical frameworks that estimate the Bayes error of an adversary's features set or the mutual information leaked by manually-crafted features. Unfortunately, as WF attacks increasingly rely on deep learning and latent feature spaces, our experiments show that security estimations based on simpler (and less informative) manually-crafted features can no longer be trusted to assess the potential success of a WF adversary in defeating such defenses. In this work, we propose DeepSE-WF, a novel WF security estimation framework that leverages specialized kNN-based estimators to produce Bayes error and mutual information estimates from learned latent feature spaces, thus bridging the gap between current WF attacks and security estimation methods. Our evaluation reveals that DeepSE-WF produces tighter security estimates than previous frameworks, reducing the required computational resources to output security estimations by one order of magnitude.
网站指纹(WF)攻击通常在基于机器学习的分类器的帮助下进行,使网络窃听者能够通过检查流量模式来确定用户正在访问哪个网站。事实证明,即使用户通过加密隧道(例如Tor或vpn)浏览互联网,这些攻击也能成功。为了评估针对WF攻击的新防御的安全性,最近的工作提出了依赖于特征的理论框架,该框架可以估计对手特征集的贝叶斯误差或手工制作的特征泄露的相互信息。不幸的是,随着WF攻击越来越依赖于深度学习和潜在特征空间,我们的实验表明,基于更简单(且信息量更少)的手工特征的安全评估不再可信,无法评估WF对手在击败此类防御方面的潜在成功。在这项工作中,我们提出了DeepSE-WF,这是一种新的WF安全估计框架,它利用专门的基于knn的估计器从学习的潜在特征空间产生贝叶斯误差和互信息估计,从而弥合了当前WF攻击和安全估计方法之间的差距。我们的评估表明,deep - wf比以前的框架产生更严格的安全估计,将输出安全估计所需的计算资源减少了一个数量级。
{"title":"DeepSE-WF: Unified Security Estimation for Website Fingerprinting Defenses","authors":"Alexander Veicht, Cedric Renggli, Diogo Barradas","doi":"10.56553/popets-2023-0047","DOIUrl":"https://doi.org/10.56553/popets-2023-0047","url":null,"abstract":"Website fingerprinting (WF) attacks, usually conducted with the help of a machine learning-based classifier, enable a network eavesdropper to pinpoint which website a user is accessing through the inspection of traffic patterns. These attacks have been shown to succeed even when users browse the Internet through encrypted tunnels, e.g., through Tor or VPNs. To assess the security of new defenses against WF attacks, recent works have proposed feature-dependent theoretical frameworks that estimate the Bayes error of an adversary's features set or the mutual information leaked by manually-crafted features. Unfortunately, as WF attacks increasingly rely on deep learning and latent feature spaces, our experiments show that security estimations based on simpler (and less informative) manually-crafted features can no longer be trusted to assess the potential success of a WF adversary in defeating such defenses. In this work, we propose DeepSE-WF, a novel WF security estimation framework that leverages specialized kNN-based estimators to produce Bayes error and mutual information estimates from learned latent feature spaces, thus bridging the gap between current WF attacks and security estimation methods. Our evaluation reveals that DeepSE-WF produces tighter security estimates than previous frameworks, reducing the required computational resources to output security estimations by one order of magnitude.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135018476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ezDPS: An Efficient and Zero-Knowledge Machine Learning Inference Pipeline ezDPS:高效的零知识机器学习推理管道
Haodi Wang, Thang Hoang
Machine Learning as a service (MLaaS) permits resource-limited clients to access powerful data analytics services ubiquitously. Despite its merits, MLaaS poses significant concerns regarding the integrity of delegated computation and the privacy of the server’s model parameters. To address this issue, Zhang et al. (CCS'20) initiated the study of zero-knowledge Machine Learning (zkML). Few zkML schemes have been proposed afterward; however, they focus on sole ML classification algorithms that may not offer satisfactory accuracy or require large-scale training data and model parameters, which may not be desirable for some applications. We propose ezDPS, a new efficient and zero-knowledge ML inference scheme. Unlike prior works, ezDPS is a zkML pipeline in which the data is processed in multiple stages for high accuracy. Each stage of ezDPS is harnessed with an established ML algorithm that is shown to be effective in various applications, including Discrete Wavelet Transformation, Principal Components Analysis, and Support Vector Machine. We design new gadgets to prove ML operations effectively. We fully implemented ezDPS and assessed its performance on real datasets. Experimental results showed that ezDPS achieves one-to-three orders of magnitude more efficient than the generic circuit-based approach in all metrics while maintaining more desirable accuracy than single ML classification approaches.
机器学习即服务(MLaaS)允许资源有限的客户无处不在地访问功能强大的数据分析服务。尽管有其优点,但MLaaS对委托计算的完整性和服务器模型参数的隐私性提出了重大关切。为了解决这个问题,Zhang等人(CCS'20)发起了零知识机器学习(zkML)的研究。后来很少有人提出zkML方案;然而,他们专注于单一的ML分类算法,这些算法可能无法提供令人满意的准确性,或者需要大规模的训练数据和模型参数,这对于某些应用来说可能是不可取的。提出了一种新的高效的零知识机器学习推理方案ezDPS。与之前的工作不同,ezDPS是一个zkML管道,其中数据在多个阶段进行处理,以实现高精度。ezDPS的每个阶段都采用了一种已建立的ML算法,该算法在各种应用中都是有效的,包括离散小波变换、主成分分析和支持向量机。我们设计了新的小工具来有效地证明机器学习操作。我们完全实现了ezDPS,并在真实数据集上评估了它的性能。实验结果表明,ezDPS在所有指标上都比基于通用电路的方法效率高1 - 3个数量级,同时保持比单一ML分类方法更理想的准确性。
{"title":"ezDPS: An Efficient and Zero-Knowledge Machine Learning Inference Pipeline","authors":"Haodi Wang, Thang Hoang","doi":"10.56553/popets-2023-0061","DOIUrl":"https://doi.org/10.56553/popets-2023-0061","url":null,"abstract":"Machine Learning as a service (MLaaS) permits resource-limited clients to access powerful data analytics services ubiquitously. Despite its merits, MLaaS poses significant concerns regarding the integrity of delegated computation and the privacy of the server’s model parameters. To address this issue, Zhang et al. (CCS'20) initiated the study of zero-knowledge Machine Learning (zkML). Few zkML schemes have been proposed afterward; however, they focus on sole ML classification algorithms that may not offer satisfactory accuracy or require large-scale training data and model parameters, which may not be desirable for some applications. We propose ezDPS, a new efficient and zero-knowledge ML inference scheme. Unlike prior works, ezDPS is a zkML pipeline in which the data is processed in multiple stages for high accuracy. Each stage of ezDPS is harnessed with an established ML algorithm that is shown to be effective in various applications, including Discrete Wavelet Transformation, Principal Components Analysis, and Support Vector Machine. We design new gadgets to prove ML operations effectively. We fully implemented ezDPS and assessed its performance on real datasets. Experimental results showed that ezDPS achieves one-to-three orders of magnitude more efficient than the generic circuit-based approach in all metrics while maintaining more desirable accuracy than single ML classification approaches.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135018475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editors' Introduction 编辑的介绍
Michelle Mazurek, Micah Sherr
{"title":"Editors' Introduction","authors":"Michelle Mazurek, Micah Sherr","doi":"10.56553/popets-2023-0037","DOIUrl":"https://doi.org/10.56553/popets-2023-0037","url":null,"abstract":"","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135016727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unintended Memorization and Timing Attacks in Named Entity Recognition Models 命名实体识别模型中的意外记忆和定时攻击
Rana Salal Ali, Benjamin Zi Hao Zhao, Hassan Jameel Asghar, Tham Nguyen, Ian David Wood, Mohamed Ali Kaafar
Named entity recognition models (NER), are widely used for identifying named entities (e.g., individuals, locations, and other information) in text documents. Machine learning based NER models are increasingly being applied in privacy-sensitive applications that need automatic and scalable identification of sensitive information to redact text for data sharing. In this paper, we study the setting when NER models are available as a black-box service for identifying sensitive information in user documents and show that these models are vulnerable to membership inference on their training datasets. With updated pre-trained NER models from spaCy, we demonstrate two distinct membership attacks on these models. Our first attack capitalizes on unintended memorization in the NER's underlying neural network, a phenomenon NNs are known to be vulnerable to. Our second attack leverages a timing side-channel to target NER models that maintain vocabularies constructed from the training data. We show that different functional paths of words within the training dataset in contrast to words not previously seen have measurable differences in execution time. Revealing membership status of training samples has clear privacy implications. For example, in text redaction, sensitive words or phrases to be found and removed, are at risk of being detected in the training dataset. Our experimental evaluation includes the redaction of both password and health data, presenting both security risks and a privacy/regulatory issues. This is exacerbated by results that indicate memorization after only a single phrase. We achieved a 70% AUC in our first attack on a text redaction use-case. We also show overwhelming success in the second timing attack with an 99.23% AUC. Finally we discuss potential mitigation approaches to realize the safe use of NER models in light of the presented privacy and security implications of membership inference attacks.
命名实体识别模型(NER)广泛用于识别文本文档中的命名实体(例如,个人、位置和其他信息)。基于机器学习的NER模型越来越多地应用于隐私敏感的应用程序,这些应用程序需要对敏感信息进行自动和可扩展的识别,以编辑文本以进行数据共享。在本文中,我们研究了NER模型作为识别用户文档中敏感信息的黑盒服务时的设置,并表明这些模型容易受到其训练数据集的隶属度推断的影响。使用来自spaCy的更新的预训练NER模型,我们演示了对这些模型的两种不同的成员攻击。我们的第一个攻击利用了神经网络底层神经网络中的意外记忆,这是一种已知的神经网络易受攻击的现象。我们的第二种攻击利用时序侧信道来瞄准维护从训练数据构建的词汇表的NER模型。我们表明,与以前未见过的单词相比,训练数据集中单词的不同功能路径在执行时间上具有可测量的差异。揭示训练样本的成员状态具有明显的隐私含义。我们的实验评估包括对密码和健康数据进行编辑,这既存在安全风险,也存在隐私/监管问题。结果表明,只记住了一个短语,这就加剧了这种情况。在对文本编校用例的第一次攻击中,我们实现了70%的AUC。我们也以99.23%的AUC在第二次定时攻击中取得了压倒性的成功。最后,我们根据成员推理攻击所带来的隐私和安全影响,讨论了实现NER模型安全使用的潜在缓解方法。
{"title":"Unintended Memorization and Timing Attacks in Named Entity Recognition Models","authors":"Rana Salal Ali, Benjamin Zi Hao Zhao, Hassan Jameel Asghar, Tham Nguyen, Ian David Wood, Mohamed Ali Kaafar","doi":"10.56553/popets-2023-0056","DOIUrl":"https://doi.org/10.56553/popets-2023-0056","url":null,"abstract":"Named entity recognition models (NER), are widely used for identifying named entities (e.g., individuals, locations, and other information) in text documents. Machine learning based NER models are increasingly being applied in privacy-sensitive applications that need automatic and scalable identification of sensitive information to redact text for data sharing. In this paper, we study the setting when NER models are available as a black-box service for identifying sensitive information in user documents and show that these models are vulnerable to membership inference on their training datasets. With updated pre-trained NER models from spaCy, we demonstrate two distinct membership attacks on these models. Our first attack capitalizes on unintended memorization in the NER's underlying neural network, a phenomenon NNs are known to be vulnerable to. Our second attack leverages a timing side-channel to target NER models that maintain vocabularies constructed from the training data. We show that different functional paths of words within the training dataset in contrast to words not previously seen have measurable differences in execution time. Revealing membership status of training samples has clear privacy implications. For example, in text redaction, sensitive words or phrases to be found and removed, are at risk of being detected in the training dataset. Our experimental evaluation includes the redaction of both password and health data, presenting both security risks and a privacy/regulatory issues. This is exacerbated by results that indicate memorization after only a single phrase. We achieved a 70% AUC in our first attack on a text redaction use-case. We also show overwhelming success in the second timing attack with an 99.23% AUC. Finally we discuss potential mitigation approaches to realize the safe use of NER models in light of the presented privacy and security implications of membership inference attacks.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135016728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy Rarely Considered: Exploring Considerations in the Adoption of Third-Party Services by Websites 隐私很少被考虑:探索网站采用第三方服务的考虑
Christine Utz, Sabrina Amft, Martin Degeling, Thorsten Holz, Sascha Fahl, Florian Schaub
Modern websites frequently use and embed third-party services to facilitate web development, connect to social media, or for monetization. This often introduces privacy issues as the inclusion of third-party services on a website can allow the third party to collect personal data about the website's visitors. While the prevalence and mechanisms of third-party web tracking have been widely studied, little is known about the decision processes that lead to websites using third-party functionality and whether efforts are being made to protect their visitors' privacy. We report results from an online survey with 395 participants involved in the creation and maintenance of websites. For ten common website functionalities we investigated if privacy has played a role in decisions about how the functionality is integrated, if specific efforts for privacy protection have been made during integration, and to what degree people are aware of data collection through third parties. We find that ease of integration drives third-party adoption but visitor privacy is considered if there are legal requirements or respective guidelines. Awareness of data collection and privacy risks is higher if the collection is directly associated with the purpose for which the third-party service is used.
现代网站经常使用和嵌入第三方服务,以促进网站开发,连接到社交媒体,或货币化。这通常会带来隐私问题,因为在网站上包含第三方服务可能会允许第三方收集网站访问者的个人数据。虽然第三方网络跟踪的普遍性和机制已经得到了广泛的研究,但对于导致网站使用第三方功能的决策过程以及是否正在努力保护访问者的隐私,人们知之甚少。我们报告了一项在线调查的结果,395名参与者参与了网站的创建和维护。对于10个常见的网站功能,我们调查了隐私是否在如何集成功能的决策中发挥了作用,在集成过程中是否做出了具体的隐私保护努力,以及人们在多大程度上意识到通过第三方收集数据。我们发现,易于集成会推动第三方采用,但如果有法律要求或相应的指导方针,则要考虑访问者的隐私。如果数据收集与使用第三方服务的目的直接相关,则对数据收集和隐私风险的意识较高。
{"title":"Privacy Rarely Considered: Exploring Considerations in the Adoption of Third-Party Services by Websites","authors":"Christine Utz, Sabrina Amft, Martin Degeling, Thorsten Holz, Sascha Fahl, Florian Schaub","doi":"10.56553/popets-2023-0002","DOIUrl":"https://doi.org/10.56553/popets-2023-0002","url":null,"abstract":"Modern websites frequently use and embed third-party services to facilitate web development, connect to social media, or for monetization. This often introduces privacy issues as the inclusion of third-party services on a website can allow the third party to collect personal data about the website's visitors. While the prevalence and mechanisms of third-party web tracking have been widely studied, little is known about the decision processes that lead to websites using third-party functionality and whether efforts are being made to protect their visitors' privacy. We report results from an online survey with 395 participants involved in the creation and maintenance of websites. For ten common website functionalities we investigated if privacy has played a role in decisions about how the functionality is integrated, if specific efforts for privacy protection have been made during integration, and to what degree people are aware of data collection through third parties. We find that ease of integration drives third-party adoption but visitor privacy is considered if there are legal requirements or respective guidelines. Awareness of data collection and privacy risks is higher if the collection is directly associated with the purpose for which the third-party service is used.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135077881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Not Your Average App: A Large-scale Privacy Analysis of Android Browsers 不是一般的应用:Android浏览器的大规模隐私分析
Amogh Pradeep, Álvaro Feal, Julien Gamba, Ashwin Rao, Martina Lindorfer, Narseo Vallina-Rodriguez, David Choffnes
The transparency and privacy behavior of mobile browsers has remained widely unexplored by the research community. In fact, as opposed to regular Android apps, mobile browsers may present contradicting privacy behaviors. On the one end, they can have access to (and can expose) a unique combination of sensitive user data, from users’ browsing history to permission-protected personally identifiable information (PII) such as unique identifiers and geolocation. However, on the other end, they also are in a unique position to protect users’ privacy by limiting data sharing with other parties by implementing ad-blocking features. In this paper, we perform a comparative and empirical analysis on how hundreds of Android web browsers protect or expose user data during browsing sessions. To this end, we collect the largest dataset of Android browsers to date, from the Google Play Store and four Chinese app stores. Then, we developed a novel analysis pipeline that combines static and dynamic analysis methods to find a wide range of privacy-enhancing (e.g., ad-blocking) and privacy-harming behaviors (e.g., sending browsing histories to third parties, not validating TLS certificates, and exposing PII---including non-resettable identifiers---to third parties) across browsers. We find that various popular apps on both Google Play and Chinese stores have these privacy-harming behaviors, including apps that claim to be privacy-enhancing in their descriptions. Overall, our study not only provides new insights into important yet overlooked considerations for browsers’ adoption and transparency, but also that automatic app analysis systems (e.g., sandboxes) need context-specific analysis to reveal such privacy behaviors.
移动浏览器的透明度和隐私行为在研究界仍然广泛未被探索。事实上,与常规的Android应用程序相反,移动浏览器可能会呈现出与隐私行为相矛盾的行为。一方面,他们可以访问(并可以公开)敏感用户数据的唯一组合,从用户的浏览历史记录到受权限保护的个人身份信息(PII),如唯一标识符和地理位置。然而,在另一方面,他们也处于一个独特的位置,通过实施广告拦截功能来限制与其他方的数据共享,从而保护用户的隐私。在本文中,我们对数百个Android web浏览器如何在浏览会话期间保护或暴露用户数据进行了比较和实证分析。为此,我们从Google Play Store和四个中国应用商店收集了迄今为止最大的Android浏览器数据集。然后,我们开发了一种新的分析管道,结合了静态和动态分析方法,以发现各种各样的隐私增强(例如,广告拦截)和隐私损害行为(例如,将浏览历史记录发送给第三方,不验证TLS证书,以及向第三方暴露PII-包括不可重置的标识符)。我们发现Google Play和中国商店中的各种流行应用程序都有这些侵犯隐私的行为,包括在其描述中声称增强隐私的应用程序。总的来说,我们的研究不仅为浏览器的采用和透明度提供了重要但被忽视的考虑因素,而且还提供了自动应用分析系统(例如沙箱)需要特定于上下文的分析来揭示此类隐私行为。
{"title":"Not Your Average App: A Large-scale Privacy Analysis of Android Browsers","authors":"Amogh Pradeep, Álvaro Feal, Julien Gamba, Ashwin Rao, Martina Lindorfer, Narseo Vallina-Rodriguez, David Choffnes","doi":"10.56553/popets-2023-0003","DOIUrl":"https://doi.org/10.56553/popets-2023-0003","url":null,"abstract":"The transparency and privacy behavior of mobile browsers has remained widely unexplored by the research community. In fact, as opposed to regular Android apps, mobile browsers may present contradicting privacy behaviors. On the one end, they can have access to (and can expose) a unique combination of sensitive user data, from users’ browsing history to permission-protected personally identifiable information (PII) such as unique identifiers and geolocation. However, on the other end, they also are in a unique position to protect users’ privacy by limiting data sharing with other parties by implementing ad-blocking features. In this paper, we perform a comparative and empirical analysis on how hundreds of Android web browsers protect or expose user data during browsing sessions. To this end, we collect the largest dataset of Android browsers to date, from the Google Play Store and four Chinese app stores. Then, we developed a novel analysis pipeline that combines static and dynamic analysis methods to find a wide range of privacy-enhancing (e.g., ad-blocking) and privacy-harming behaviors (e.g., sending browsing histories to third parties, not validating TLS certificates, and exposing PII---including non-resettable identifiers---to third parties) across browsers. We find that various popular apps on both Google Play and Chinese stores have these privacy-harming behaviors, including apps that claim to be privacy-enhancing in their descriptions. Overall, our study not only provides new insights into important yet overlooked considerations for browsers’ adoption and transparency, but also that automatic app analysis systems (e.g., sandboxes) need context-specific analysis to reveal such privacy behaviors.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135420120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1