2021 IEEE Symposium on Security and Privacy (SP)最新文献_第9页

Self-Supervised Euphemism Detection and Identification for Content Moderation 内容审核中的自监督委婉语检测与识别

2021 IEEE Symposium on Security and Privacy (SP)

Pub Date : 2021-03-31 DOI: 10.1109/SP40001.2021.00075

Wanzheng Zhu, Hongyu Gong, Rohan Bansal, Zachary Weinberg, Nicolas Christin, G. Fanti, S. Bhat

Fringe groups and organizations have a long history of using euphemisms—ordinary-sounding words with a secret meaning—to conceal what they are discussing. Nowadays, one common use of euphemisms is to evade content moderation policies enforced by social media platforms. Existing tools for enforcing policy automatically rely on keyword searches for words on a "ban list", but these are notoriously imprecise: even when limited to swearwords, they can still cause embarrassing false positives [1]. When a commonly used ordinary word acquires a euphemistic meaning, adding it to a keyword-based ban list is hopeless: consider "pot" (storage container or marijuana?) or "heater" (household appliance or firearm?) The current generation of social media companies instead hire staff to check posts manually, but this is expensive, inhumane, and not much more effective. It is usually apparent to a human moderator that a word is being used euphemistically, but they may not know what the secret meaning is, and therefore whether the message violates policy. Also, when a euphemism is banned, the group that used it need only invent another one, leaving moderators one step behind.This paper will demonstrate unsupervised algorithms that, by analyzing words in their sentence-level context, can both detect words being used euphemistically, and identify the secret meaning of each word. Compared to the existing state of the art, which uses context-free word embeddings, our algorithm for detecting euphemisms achieves 30–400% higher detection accuracies of unlabeled euphemisms in a text corpus. Our algorithm for revealing euphemistic meanings of words is the first of its kind, as far as we are aware. In the arms race between content moderators and policy evaders, our algorithms may help shift the balance in the direction of the moderators.

边缘团体和组织使用委婉语——听起来很普通却有秘密含义的词——来掩盖他们正在讨论的内容，这一习惯由来已久。如今，委婉语的一个常见用途是逃避社交媒体平台执行的内容审核政策。现有的执行策略的工具自动依赖于“禁止列表”上的关键词搜索，但这些工具是出了名的不精确:即使仅限于脏话，它们仍然会导致尴尬的误报[1]。当一个常用的普通单词获得了委婉的含义时，将其添加到基于关键字的禁用列表中是无望的:考虑“pot”(储存容器或大麻?)或“加热器”(家用电器或枪支?)当前这一代的社交媒体公司转而雇佣员工手动检查帖子，但这既昂贵又不人道，而且效率也不会高得多。对于人工版主来说，一个词的委婉使用通常是显而易见的，但他们可能不知道其中的秘密含义是什么，因此也不知道该消息是否违反了策略。此外，当一种委婉语被禁止时，使用它的组织只需要发明另一种委婉语，让版主落后一步。本文将演示无监督算法，通过分析句子级上下文中的单词，既可以检测委婉使用的单词，又可以识别每个单词的秘密含义。与使用上下文无关词嵌入的现有技术相比，我们的委婉语检测算法对文本语料库中未标记委婉语的检测准确率提高了30-400%。据我们所知，我们用于揭示单词委婉含义的算法是同类算法中的第一个。在内容审核员和政策逃避者之间的军备竞赛中，我们的算法可能有助于将平衡转向审核员的方向。

{"title":"Self-Supervised Euphemism Detection and Identification for Content Moderation","authors":"Wanzheng Zhu, Hongyu Gong, Rohan Bansal, Zachary Weinberg, Nicolas Christin, G. Fanti, S. Bhat","doi":"10.1109/SP40001.2021.00075","DOIUrl":"https://doi.org/10.1109/SP40001.2021.00075","url":null,"abstract":"Fringe groups and organizations have a long history of using euphemisms—ordinary-sounding words with a secret meaning—to conceal what they are discussing. Nowadays, one common use of euphemisms is to evade content moderation policies enforced by social media platforms. Existing tools for enforcing policy automatically rely on keyword searches for words on a \"ban list\", but these are notoriously imprecise: even when limited to swearwords, they can still cause embarrassing false positives [1]. When a commonly used ordinary word acquires a euphemistic meaning, adding it to a keyword-based ban list is hopeless: consider \"pot\" (storage container or marijuana?) or \"heater\" (household appliance or firearm?) The current generation of social media companies instead hire staff to check posts manually, but this is expensive, inhumane, and not much more effective. It is usually apparent to a human moderator that a word is being used euphemistically, but they may not know what the secret meaning is, and therefore whether the message violates policy. Also, when a euphemism is banned, the group that used it need only invent another one, leaving moderators one step behind.This paper will demonstrate unsupervised algorithms that, by analyzing words in their sentence-level context, can both detect words being used euphemistically, and identify the secret meaning of each word. Compared to the existing state of the art, which uses context-free word embeddings, our algorithm for detecting euphemisms achieves 30–400% higher detection accuracies of unlabeled euphemisms in a text corpus. Our algorithm for revealing euphemistic meanings of words is the first of its kind, as far as we are aware. In the arms race between content moderators and policy evaders, our algorithms may help shift the balance in the direction of the moderators.","PeriodicalId":6786,"journal":{"name":"2021 IEEE Symposium on Security and Privacy (SP)","volume":"4 1","pages":"229-246"},"PeriodicalIF":0.0,"publicationDate":"2021-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88324330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Compositional Security for Reentrant Applications 可重入应用程序的组合安全性

2021 IEEE Symposium on Security and Privacy (SP)

Pub Date : 2021-03-15 DOI: 10.1109/SP40001.2021.00084

Ethan Cecchetti, Siqiu Yao, Haobin Ni, A. Myers

The disastrous vulnerabilities in smart contracts sharply remind us of our ignorance: we do not know how to write code that is secure in composition with malicious code. Information flow control has long been proposed as a way to achieve compositional security, offering strong guarantees even when combining software from different trust domains. Unfortunately, this appealing story breaks down in the presence of reentrancy attacks. We formalize a general definition of reentrancy and introduce a security condition that allows software modules like smart contracts to protect their key invariants while retaining the expressive power of safe forms of reentrancy. We present a security type system that provably enforces secure information flow; in conjunction with run-time mechanisms, it enforces secure reentrancy even in the presence of unknown code; and it helps locate and correct recent high-profile vulnerabilities.

智能合约中的灾难性漏洞尖锐地提醒我们自己的无知:我们不知道如何编写与恶意代码组合在一起的安全代码。信息流控制一直被认为是实现组合安全性的一种方法，即使在组合来自不同信任域的软件时也能提供强有力的保证。不幸的是，这个吸引人的故事在可重入攻击的存在下崩溃了。我们形式化了可重入的一般定义，并引入了一种安全条件，允许像智能合约这样的软件模块保护其关键不变量，同时保留安全形式的可重入的表达能力。我们提出了一个安全类型的系统，可证明地加强安全的信息流;与运行时机制相结合，即使在存在未知代码的情况下，它也可以强制执行安全的重入;它还有助于定位和纠正最近引人注目的漏洞。

引用次数: 22

Proof-of-Learning: Definitions and Practice 学习证明:定义和实践

2021 IEEE Symposium on Security and Privacy (SP)

Pub Date : 2021-03-09 DOI: 10.1109/SP40001.2021.00106

Hengrui Jia, Mohammad Yaghini, Christopher A. Choquette-Choo, Natalie Dullerud, Anvith Thudi, Varun Chandrasekaran, Nicolas Papernot

Training machine learning (ML) models typically involves expensive iterative optimization. Once the model’s final parameters are released, there is currently no mechanism for the entity which trained the model to prove that these parameters were indeed the result of this optimization procedure. Such a mechanism would support security of ML applications in several ways. For instance, it would simplify ownership resolution when multiple parties contest ownership of a specific model. It would also facilitate the distributed training across untrusted workers where Byzantine workers might otherwise mount a denial-ofservice by returning incorrect model updates.In this paper, we remediate this problem by introducing the concept of proof-of-learning in ML. Inspired by research on both proof-of-work and verified computations, we observe how a seminal training algorithm, stochastic gradient descent, accumulates secret information due to its stochasticity. This produces a natural construction for a proof-of-learning which demonstrates that a party has expended the compute require to obtain a set of model parameters correctly. In particular, our analyses and experiments show that an adversary seeking to illegitimately manufacture a proof-of-learning needs to perform at least as much work than is needed for gradient descent itself.We also instantiate a concrete proof-of-learning mechanism in both of the scenarios described above. In model ownership resolution, it protects the intellectual property of models released publicly. In distributed training, it preserves availability of the training procedure. Our empirical evaluation validates that our proof-of-learning mechanism is robust to variance induced by the hardware (e.g., ML accelerators) and software stacks.

训练机器学习(ML)模型通常涉及昂贵的迭代优化。一旦模型的最终参数被释放，训练模型的实体目前没有机制来证明这些参数确实是这个优化过程的结果。这样的机制将以多种方式支持机器学习应用程序的安全性。例如，当多方争夺特定模型的所有权时，它将简化所有权解决方案。它还将促进不受信任的工人之间的分布式培训，否则拜占庭工人可能会通过返回错误的模型更新来拒绝服务。在本文中，我们通过在ML中引入学习证明的概念来解决这个问题。受工作量证明和验证计算研究的启发，我们观察了一个开创性的训练算法，随机梯度下降，是如何由于其随机性而积累秘密信息的。这为学习证明产生了一个自然的结构，它表明一方已经扩展了计算需求，以正确地获得一组模型参数。特别是，我们的分析和实验表明，试图非法制造学习证明的对手需要执行至少与梯度下降本身所需的工作一样多的工作。我们还在上述两种场景中实例化了具体的学习证明机制。在模型所有权决议中，它保护公开发布的模型的知识产权。在分布式训练中，它保留了训练过程的可用性。我们的经验评估验证了我们的学习证明机制对硬件(例如ML加速器)和软件堆栈引起的方差具有鲁棒性。

{"title":"Proof-of-Learning: Definitions and Practice","authors":"Hengrui Jia, Mohammad Yaghini, Christopher A. Choquette-Choo, Natalie Dullerud, Anvith Thudi, Varun Chandrasekaran, Nicolas Papernot","doi":"10.1109/SP40001.2021.00106","DOIUrl":"https://doi.org/10.1109/SP40001.2021.00106","url":null,"abstract":"Training machine learning (ML) models typically involves expensive iterative optimization. Once the model’s final parameters are released, there is currently no mechanism for the entity which trained the model to prove that these parameters were indeed the result of this optimization procedure. Such a mechanism would support security of ML applications in several ways. For instance, it would simplify ownership resolution when multiple parties contest ownership of a specific model. It would also facilitate the distributed training across untrusted workers where Byzantine workers might otherwise mount a denial-ofservice by returning incorrect model updates.In this paper, we remediate this problem by introducing the concept of proof-of-learning in ML. Inspired by research on both proof-of-work and verified computations, we observe how a seminal training algorithm, stochastic gradient descent, accumulates secret information due to its stochasticity. This produces a natural construction for a proof-of-learning which demonstrates that a party has expended the compute require to obtain a set of model parameters correctly. In particular, our analyses and experiments show that an adversary seeking to illegitimately manufacture a proof-of-learning needs to perform at least as much work than is needed for gradient descent itself.We also instantiate a concrete proof-of-learning mechanism in both of the scenarios described above. In model ownership resolution, it protects the intellectual property of models released publicly. In distributed training, it preserves availability of the training procedure. Our empirical evaluation validates that our proof-of-learning mechanism is robust to variance induced by the hardware (e.g., ML accelerators) and software stacks.","PeriodicalId":6786,"journal":{"name":"2021 IEEE Symposium on Security and Privacy (SP)","volume":"87 1","pages":"1039-1056"},"PeriodicalIF":0.0,"publicationDate":"2021-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76096967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 56

On the Just-In-Time Discovery of Profit-Generating Transactions in DeFi Protocols DeFi协议中盈利交易的即时发现问题

2021 IEEE Symposium on Security and Privacy (SP)

Pub Date : 2021-03-03 DOI: 10.1109/SP40001.2021.00113

Liyi Zhou, Kaihua Qin, Antoine Cully, B. Livshits, Arthur Gervais

Decentralized Finance (DeFi) is a blockchain-asset-enabled finance ecosystem with millions of daily USD transaction volume, billions of locked up USD, as well as a plethora of newly emerging protocols (for lending, staking, and exchanges). Because all transactions, user balances, and total value locked in DeFi are publicly readable, a natural question that arises is: how can we automatically craft profitable transactions across the intertwined DeFi platforms?In this paper, we investigate two methods that allow us to automatically create profitable DeFi trades, one well-suited to arbitrage and the other applicable to more complicated settings. We first adopt the Bellman-Ford-Moore algorithm with DeFiPoser-ARB and then create logical DeFi protocol models for a theorem prover in DeFiPoser-SMT. While DeFiPoser-ARB focuses on DeFi transactions that form a cycle and performs very well for arbitrage, DeFiPoser-SMT can detect more complicated profitable transactions. We estimate that DeFiPoser-ARB and DeFiPoser-SMT can generate an average weekly revenue of 191.48 ETH (76,592 USD) and 72.44 ETH (28,976 USD) respectively, with the highest transaction revenue being 81.31 ETH (32,524 USD) and 22.40 ETH (8,960 USD) respectively. We further show that DeFiPoser-SMT finds the known economic bZx attack from February 2020, which yields 0.48M USD. Our forensic investigations show that this opportunity existed for 69 days and could have yielded more revenue if exploited one day earlier. Our evaluation spans 150 days, given 96 DeFi protocol actions, and 25 assets.Looking beyond the financial gains mentioned above, forks deteriorate the blockchain consensus security, as they increase the risks of double-spending and selfish mining. We explore the implications of DeFiPoser-ARB and DeFiPoser-SMT on blockchain consensus. Specifically, we show that the trades identified by our tools exceed the Ethereum block reward by up to 874×. Given optimal adversarial strategies provided by a Markov Decision Process (MDP), we quantify the value threshold at which a profitable transaction qualifies as Miner Extractable Value (MEV) and would incentivize MEV-aware miners to fork the blockchain. For instance, we find that on Ethereum, a miner with a hash rate of 10% would fork the blockchain if an MEV opportunity exceeds 4× the block reward.

去中心化金融(DeFi)是一个支持区块链资产的金融生态系统，每天有数百万美元的交易量，数十亿美元的锁定美元，以及大量新兴的协议(用于贷款，投资和交换)。由于DeFi中锁定的所有交易、用户余额和总价值都是公开可读的，因此自然会出现这样一个问题:我们如何在相互交织的DeFi平台上自动创建有利可图的交易?在本文中，我们研究了两种允许我们自动创建有利可图的DeFi交易的方法，一种非常适合套利，另一种适用于更复杂的设置。我们首先在defposer - arb中采用Bellman-Ford-Moore算法，然后在defposer - smt中为定理证明器创建逻辑DeFi协议模型。defposer - arb侧重于形成周期的DeFi交易，并且对套利执行得非常好，而defposer - smt可以检测更复杂的有利可图的交易。我们估计defposer - arb和defposer - smt的平均周收入分别为191.48 ETH (76,592 USD)和72.44 ETH (28,976 USD)，最高交易收入分别为81.31 ETH (32,524 USD)和22.40 ETH (8,960 USD)。我们进一步表明，defposer - smt发现了2020年2月以来已知的经济bZx攻击，该攻击产生了48万美元。我们的法医调查显示，这个机会存在了69天，如果早一天利用，可能会产生更多的收入。我们的评估持续了150天，给出了96个DeFi协议动作和25个资产。除了上面提到的经济收益之外，分叉还会恶化区块链共识的安全性，因为它们增加了双重支出和自私挖矿的风险。我们探讨了defposer - arb和defposer - smt对区块链共识的影响。具体来说，我们表明，我们的工具识别的交易超过了以太坊区块奖励高达874x。考虑到马尔可夫决策过程(MDP)提供的最优对抗策略，我们量化了一个有利可图的交易符合矿工可提取价值(MEV)的价值阈值，并将激励意识到MEV的矿工分叉区块链。例如，我们发现在以太坊上，如果MEV机会超过区块奖励的4倍，哈希率为10%的矿工将分叉区块链。

{"title":"On the Just-In-Time Discovery of Profit-Generating Transactions in DeFi Protocols","authors":"Liyi Zhou, Kaihua Qin, Antoine Cully, B. Livshits, Arthur Gervais","doi":"10.1109/SP40001.2021.00113","DOIUrl":"https://doi.org/10.1109/SP40001.2021.00113","url":null,"abstract":"Decentralized Finance (DeFi) is a blockchain-asset-enabled finance ecosystem with millions of daily USD transaction volume, billions of locked up USD, as well as a plethora of newly emerging protocols (for lending, staking, and exchanges). Because all transactions, user balances, and total value locked in DeFi are publicly readable, a natural question that arises is: how can we automatically craft profitable transactions across the intertwined DeFi platforms?In this paper, we investigate two methods that allow us to automatically create profitable DeFi trades, one well-suited to arbitrage and the other applicable to more complicated settings. We first adopt the Bellman-Ford-Moore algorithm with DeFiPoser-ARB and then create logical DeFi protocol models for a theorem prover in DeFiPoser-SMT. While DeFiPoser-ARB focuses on DeFi transactions that form a cycle and performs very well for arbitrage, DeFiPoser-SMT can detect more complicated profitable transactions. We estimate that DeFiPoser-ARB and DeFiPoser-SMT can generate an average weekly revenue of 191.48 ETH (76,592 USD) and 72.44 ETH (28,976 USD) respectively, with the highest transaction revenue being 81.31 ETH (32,524 USD) and 22.40 ETH (8,960 USD) respectively. We further show that DeFiPoser-SMT finds the known economic bZx attack from February 2020, which yields 0.48M USD. Our forensic investigations show that this opportunity existed for 69 days and could have yielded more revenue if exploited one day earlier. Our evaluation spans 150 days, given 96 DeFi protocol actions, and 25 assets.Looking beyond the financial gains mentioned above, forks deteriorate the blockchain consensus security, as they increase the risks of double-spending and selfish mining. We explore the implications of DeFiPoser-ARB and DeFiPoser-SMT on blockchain consensus. Specifically, we show that the trades identified by our tools exceed the Ethereum block reward by up to 874×. Given optimal adversarial strategies provided by a Markov Decision Process (MDP), we quantify the value threshold at which a profitable transaction qualifies as Miner Extractable Value (MEV) and would incentivize MEV-aware miners to fork the blockchain. For instance, we find that on Ethereum, a miner with a hash rate of 10% would fork the blockchain if an MEV opportunity exceeds 4× the block reward.","PeriodicalId":6786,"journal":{"name":"2021 IEEE Symposium on Security and Privacy (SP)","volume":"10 1","pages":"919-936"},"PeriodicalIF":0.0,"publicationDate":"2021-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84505002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 79

SoK: Fully Homomorphic Encryption Compilers 全同态加密编译器

2021 IEEE Symposium on Security and Privacy (SP)

Pub Date : 2021-01-18 DOI: 10.1109/SP40001.2021.00068

Alexander Viand, Patrick Jattke, Anwar Hithnawi

Fully Homomorphic Encryption (FHE) allows a third party to perform arbitrary computations on encrypted data, learning neither the inputs nor the computation results. Hence, it provides resilience in situations where computations are carried out by an untrusted or potentially compromised party. This powerful concept was first conceived by Rivest et al. in the 1970s. However, it remained unrealized until Craig Gentry presented the first feasible FHE scheme in 2009.The advent of the massive collection of sensitive data in cloud services, coupled with a plague of data breaches, moved highly regulated businesses to increasingly demand confidential and secure computing solutions. This demand, in turn, has led to a recent surge in the development of FHE tools. To understand the landscape of recent FHE tool developments, we conduct an extensive survey and experimental evaluation to explore the current state of the art and identify areas for future development.In this paper, we survey, evaluate, and systematize FHE tools and compilers. We perform experiments to evaluate these tools’ performance and usability aspects on a variety of applications. We conclude with recommendations for developers intending to develop FHE-based applications and a discussion on future directions for FHE tools development.

完全同态加密(FHE)允许第三方对加密数据执行任意计算，既不了解输入，也不了解计算结果。因此，它在由不受信任的或可能受到损害的一方执行计算的情况下提供了弹性。这个强大的概念最早是由Rivest等人在20世纪70年代提出的。然而，直到2009年Craig Gentry提出了第一个可行的FHE方案，这个想法才得以实现。云服务中大量敏感数据的出现，再加上数据泄露的泛滥，使得受到严格监管的企业越来越需要保密和安全的计算解决方案。这种需求反过来又导致了FHE工具最近的发展激增。为了了解最近FHE工具的发展情况，我们进行了广泛的调查和实验评估，以探索当前的艺术状态并确定未来发展的领域。在本文中，我们对FHE工具和编译器进行了调查、评估和系统化。我们执行实验来评估这些工具在各种应用程序上的性能和可用性。最后，我们对打算开发基于FHE的应用程序的开发人员提出了建议，并讨论了FHE工具开发的未来方向。

{"title":"SoK: Fully Homomorphic Encryption Compilers","authors":"Alexander Viand, Patrick Jattke, Anwar Hithnawi","doi":"10.1109/SP40001.2021.00068","DOIUrl":"https://doi.org/10.1109/SP40001.2021.00068","url":null,"abstract":"Fully Homomorphic Encryption (FHE) allows a third party to perform arbitrary computations on encrypted data, learning neither the inputs nor the computation results. Hence, it provides resilience in situations where computations are carried out by an untrusted or potentially compromised party. This powerful concept was first conceived by Rivest et al. in the 1970s. However, it remained unrealized until Craig Gentry presented the first feasible FHE scheme in 2009.The advent of the massive collection of sensitive data in cloud services, coupled with a plague of data breaches, moved highly regulated businesses to increasingly demand confidential and secure computing solutions. This demand, in turn, has led to a recent surge in the development of FHE tools. To understand the landscape of recent FHE tool developments, we conduct an extensive survey and experimental evaluation to explore the current state of the art and identify areas for future development.In this paper, we survey, evaluate, and systematize FHE tools and compilers. We perform experiments to evaluate these tools’ performance and usability aspects on a variety of applications. We conclude with recommendations for developers intending to develop FHE-based applications and a discussion on future directions for FHE tools development.","PeriodicalId":6786,"journal":{"name":"2021 IEEE Symposium on Security and Privacy (SP)","volume":"105 1","pages":"1092-1108"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85830037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 65

Adversary Instantiation: Lower Bounds for Differentially Private Machine Learning 对手实例化:差分私有机器学习的下界

2021 IEEE Symposium on Security and Privacy (SP)

Pub Date : 2021-01-11 DOI: 10.1109/SP40001.2021.00069

Milad Nasr, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Nicholas Carlini

Differentially private (DP) machine learning allows us to train models on private data while limiting data leakage. DP formalizes this data leakage through a cryptographic game, where an adversary must predict if a model was trained on a dataset D, or a dataset D′ that differs in just one example. If observing the training algorithm does not meaningfully increase the adversary's odds of successfully guessing which dataset the model was trained on, then the algorithm is said to be differentially private. Hence, the purpose of privacy analysis is to upper bound the probability that any adversary could successfully guess which dataset the model was trained on.In our paper, we instantiate this hypothetical adversary in order to establish lower bounds on the probability that this distinguishing game can be won. We use this adversary to evaluate the importance of the adversary capabilities allowed in the privacy analysis of DP training algorithms.For DP-SGD, the most common method for training neural networks with differential privacy, our lower bounds are tight and match the theoretical upper bound. This implies that in order to prove better upper bounds, it will be necessary to make use of additional assumptions. Fortunately, we find that our attacks are significantly weaker when additional (realistic) restrictions are put in place on the adversary's capabilities. Thus, in the practical setting common to many real-world deployments, there is a gap between our lower bounds and the upper bounds provided by the analysis: differential privacy is conservative and adversaries may not be able to leak as much information as suggested by the theoretical bound.

差分私有(DP)机器学习允许我们在私有数据上训练模型，同时限制数据泄漏。DP通过一个加密游戏将这种数据泄露形式化，在这个游戏中，攻击者必须预测一个模型是在数据集D上训练的，还是在数据集D '上训练的，而数据集D '只是在一个例子中有所不同。如果观察训练算法并不能有效地增加对手成功猜测模型是在哪个数据集上训练的几率，那么该算法就被称为差异私有算法。因此，隐私分析的目的是为任何对手能够成功猜测模型在哪个数据集上训练的概率设定上限。在我们的论文中，我们实例化了这个假设对手，以便建立这个区分博弈可以获胜的概率的下界。我们使用这个对手来评估在DP训练算法的隐私分析中允许的对手能力的重要性。对于训练具有差分隐私的神经网络的最常用方法DP-SGD，我们的下界是紧密的，并且与理论上界匹配。这意味着，为了证明更好的上界，有必要使用额外的假设。幸运的是，我们发现当对对手的能力施加额外的(现实的)限制时，我们的攻击会明显减弱。因此，在许多实际部署中常见的实际设置中，我们的下界和分析提供的上界之间存在差距:差分隐私是保守的，攻击者可能无法泄漏理论界所建议的那么多信息。

{"title":"Adversary Instantiation: Lower Bounds for Differentially Private Machine Learning","authors":"Milad Nasr, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Nicholas Carlini","doi":"10.1109/SP40001.2021.00069","DOIUrl":"https://doi.org/10.1109/SP40001.2021.00069","url":null,"abstract":"Differentially private (DP) machine learning allows us to train models on private data while limiting data leakage. DP formalizes this data leakage through a cryptographic game, where an adversary must predict if a model was trained on a dataset D, or a dataset D′ that differs in just one example. If observing the training algorithm does not meaningfully increase the adversary's odds of successfully guessing which dataset the model was trained on, then the algorithm is said to be differentially private. Hence, the purpose of privacy analysis is to upper bound the probability that any adversary could successfully guess which dataset the model was trained on.In our paper, we instantiate this hypothetical adversary in order to establish lower bounds on the probability that this distinguishing game can be won. We use this adversary to evaluate the importance of the adversary capabilities allowed in the privacy analysis of DP training algorithms.For DP-SGD, the most common method for training neural networks with differential privacy, our lower bounds are tight and match the theoretical upper bound. This implies that in order to prove better upper bounds, it will be necessary to make use of additional assumptions. Fortunately, we find that our attacks are significantly weaker when additional (realistic) restrictions are put in place on the adversary's capabilities. Thus, in the practical setting common to many real-world deployments, there is a gap between our lower bounds and the upper bounds provided by the analysis: differential privacy is conservative and adversaries may not be able to leak as much information as suggested by the theoretical bound.","PeriodicalId":6786,"journal":{"name":"2021 IEEE Symposium on Security and Privacy (SP)","volume":"37 1","pages":"866-882"},"PeriodicalIF":0.0,"publicationDate":"2021-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84688382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 131

SGUARD: Towards Fixing Vulnerable Smart Contracts Automatically SGUARD:自动修复易受攻击的智能合约

2021 IEEE Symposium on Security and Privacy (SP)

Pub Date : 2021-01-06 DOI: 10.1109/SP40001.2021.00057

T. D. Nguyen, Long H. Pham, Jun Sun

Smart contracts are distributed, self-enforcing programs executing on top of blockchain networks. They have the potential to revolutionize many industries such as financial institutes and supply chains. However, smart contracts are subject to code-based vulnerabilities, which casts a shadow on its applications. As smart contracts are unpatchable (due to the immutability of blockchain), it is essential that smart contracts are guaranteed to be free of vulnerabilities. Unfortunately, smart contract languages such as Solidity are Turing-complete, which implies that verifying them statically is infeasible. Thus, alternative approaches must be developed to provide the guarantee. In this work, we develop an approach which automatically transforms smart contracts so that they are provably free of 4 common kinds of vulnerabilities. The key idea is to apply run-time verification in an efficient and provably correct manner. Experiment results with 5000 smart contracts show that our approach incurs minor run-time overhead in terms of time (i.e., 14.79%) and gas (i.e., 0.79%).

智能合约是在区块链网络上执行的分布式、自我执行的程序。它们有可能彻底改变许多行业，如金融机构和供应链。然而，智能合约受到基于代码的漏洞的影响，这给其应用程序蒙上了阴影。由于智能合约是不可修补的(由于区块链的不变性)，因此保证智能合约不存在漏洞至关重要。不幸的是，像Solidity这样的智能合约语言是图灵完备的，这意味着静态验证它们是不可行的。因此，必须制定其他办法来提供保证。在这项工作中，我们开发了一种自动转换智能合约的方法，使它们可以证明不存在4种常见的漏洞。关键思想是以一种有效且可证明正确的方式应用运行时验证。5000个智能合约的实验结果表明，我们的方法在时间(即14.79%)和gas(即0.79%)方面产生了较小的运行时开销。

引用次数: 29

Learning Differentially Private Mechanisms

2021 IEEE Symposium on Security and Privacy (SP)

Pub Date : 2021-01-04 DOI: 10.1109/SP40001.2021.00060

Subhajit Roy, Justin Hsu, Aws Albarghouthi

Differential privacy is a formal, mathematical definition of data privacy that has gained traction in academia, industry, and government. The task of correctly constructing differentially private algorithms is non-trivial, and mistakes have been made in foundational algorithms. Currently, there is no automated support for converting an existing, non-private program into a differentially private version. In this paper, we propose a technique for automatically learning an accurate and differentially private version of a given non-private program. We show how to solve this difficult program synthesis problem via a combination of techniques: carefully picking representative example inputs, reducing the problem to continuous optimization, and mapping the results back to symbolic expressions. We demonstrate that our approach is able to learn foundational algorithms from the differential privacy literature and significantly outperforms natural program synthesis baselines.

差分隐私是数据隐私的一种正式的数学定义，在学术界、工业界和政府中得到了广泛的关注。正确构造差分私有算法是一项艰巨的任务，在基础算法中已经出现了一些错误。目前，没有自动支持将现有的非私有程序转换为不同的私有版本。在本文中，我们提出了一种自动学习给定非私有程序的准确和差异私有版本的技术。我们展示了如何通过技术组合来解决这个困难的程序合成问题:仔细挑选有代表性的示例输入，将问题简化为持续优化，并将结果映射回符号表达式。我们证明，我们的方法能够从差分隐私文献中学习基本算法，并且显著优于自然程序合成基线。

引用次数: 9

Lightweight Techniques for Private Heavy Hitters 私人重拳手的轻量级技术

2021 IEEE Symposium on Security and Privacy (SP)

Pub Date : 2020-12-29 DOI: 10.1109/SP40001.2021.00048

D. Boneh, Elette Boyle, Henry Corrigan-Gibbs, N. Gilboa, Y. Ishai

This paper presents a new protocol for solving the private heavy-hitters problem. In this problem, there are many clients and a small set of data-collection servers. Each client holds a private bitstring. The servers want to recover the set of all popular strings, without learning anything else about any client’s string. A web-browser vendor, for instance, can use our protocol to figure out which homepages are popular, without learning any user’s homepage. We also consider the simpler private subset-histogram problem, in which the servers want to count how many clients hold strings in a particular set without revealing this set to the clients.Our protocols use two data-collection servers and, in a protocol run, each client send sends only a single message to the servers. Our protocols protect client privacy against arbitrary misbehavior by one of the servers and our approach requires no public-key cryptography (except for secure channels), nor general-purpose multiparty computation. Instead, we rely on incremental distributed point functions, a new cryptographic tool that allows a client to succinctly secret-share the labels on the nodes of an exponentially large binary tree, provided that the tree has a single non-zero path. Along the way, we develop new general tools for providing malicious security in applications of distributed point functions.A limitation of our heavy-hitters protocol is that it reveals to the servers slightly more information than the set of popular strings itself. We precisely define and quantify this leakage and explain how to ameliorate its effects. In an experimental evaluation with two servers on opposite sides of the U.S., the servers can find the 200 most popular strings among a set of 400,000 client-held 256-bit strings in 54 minutes. Our protocols are highly parallelizable. We estimate that with 20 physical machines per logical server, our protocols could compute heavy hitters over ten million clients in just over one hour of computation.

本文提出了一种解决私人大人物问题的新协议。在这个问题中，有许多客户机和一小组数据收集服务器。每个客户端持有一个私有位串。服务器希望在不了解任何客户端字符串的情况下恢复所有流行字符串的集合。例如，网络浏览器供应商可以使用我们的协议来找出哪些主页是受欢迎的，而无需了解任何用户的主页。我们还考虑了更简单的私有子集直方图问题，在这个问题中，服务器想要计算在一个特定集合中有多少客户端持有字符串，而不向客户端透露这个集合。我们的协议使用两个数据收集服务器，在协议运行时，每个客户端只向服务器发送一条消息。我们的协议保护客户端隐私，防止其中一个服务器的任意不当行为，我们的方法不需要公钥加密(安全通道除外)，也不需要通用的多方计算。相反，我们依赖增量分布式点函数，这是一种新的加密工具，它允许客户端简洁地秘密共享指数级大二叉树节点上的标签，前提是该树具有单个非零路径。在此过程中，我们开发了新的通用工具，用于在分布式点函数的应用程序中提供恶意安全性。我们的重量级协议的一个限制是，它向服务器显示的信息比流行字符串集本身稍微多一些。我们精确地定义和量化了这种泄漏，并解释了如何改善其影响。在对位于美国两端的两台服务器进行的实验评估中，服务器可以在54分钟内从40万个客户端持有的256位字符串中找到200个最受欢迎的字符串。我们的协议是高度并行的。我们估计，每个逻辑服务器有20台物理机器，我们的协议可以在一个多小时的计算中计算超过1000万个客户机。

{"title":"Lightweight Techniques for Private Heavy Hitters","authors":"D. Boneh, Elette Boyle, Henry Corrigan-Gibbs, N. Gilboa, Y. Ishai","doi":"10.1109/SP40001.2021.00048","DOIUrl":"https://doi.org/10.1109/SP40001.2021.00048","url":null,"abstract":"This paper presents a new protocol for solving the private heavy-hitters problem. In this problem, there are many clients and a small set of data-collection servers. Each client holds a private bitstring. The servers want to recover the set of all popular strings, without learning anything else about any client’s string. A web-browser vendor, for instance, can use our protocol to figure out which homepages are popular, without learning any user’s homepage. We also consider the simpler private subset-histogram problem, in which the servers want to count how many clients hold strings in a particular set without revealing this set to the clients.Our protocols use two data-collection servers and, in a protocol run, each client send sends only a single message to the servers. Our protocols protect client privacy against arbitrary misbehavior by one of the servers and our approach requires no public-key cryptography (except for secure channels), nor general-purpose multiparty computation. Instead, we rely on incremental distributed point functions, a new cryptographic tool that allows a client to succinctly secret-share the labels on the nodes of an exponentially large binary tree, provided that the tree has a single non-zero path. Along the way, we develop new general tools for providing malicious security in applications of distributed point functions.A limitation of our heavy-hitters protocol is that it reveals to the servers slightly more information than the set of popular strings itself. We precisely define and quantify this leakage and explain how to ameliorate its effects. In an experimental evaluation with two servers on opposite sides of the U.S., the servers can find the 200 most popular strings among a set of 400,000 client-held 256-bit strings in 54 minutes. Our protocols are highly parallelizable. We estimate that with 20 physical machines per logical server, our protocols could compute heavy hitters over ten million clients in just over one hour of computation.","PeriodicalId":6786,"journal":{"name":"2021 IEEE Symposium on Security and Privacy (SP)","volume":"56 1","pages":"762-776"},"PeriodicalIF":0.0,"publicationDate":"2020-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75470417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 67

Cross Layer Attacks and How to Use Them (for DNS Cache Poisoning, Device Tracking and More) 跨层攻击以及如何使用它们(用于DNS缓存中毒，设备跟踪等)

2021 IEEE Symposium on Security and Privacy (SP)

Pub Date : 2020-12-14 DOI: 10.1109/SP40001.2021.00054

Amit Klein

We analyze the prandom pseudo random number generator (PRNG) in use in the Linux kernel (which is the kernel of the Linux operating system, as well as of Android) and demonstrate that this PRNG is weak. The prandom PRNG is in use by many "consumers" in the Linux kernel. We focused on three consumers at the network level – the UDP source port generation algorithm, the IPv6 flow label generation algorithm and the IPv4 ID generation algorithm. The flawed prandom PRNG is shared by all these consumers, which enables us to mount "cross layer attacks" against the Linux kernel. In these attacks, we infer the internal state of the prandom PRNG from one OSI layer, and use it to either predict the values of the PRNG employed by the other OSI layer, or to correlate it to an internal state of the PRNG inferred from the other protocol.Using this approach we can mount a very efficient DNS cache poisoning attack against Linux. We collect TCP/IPv6 flow label values, or UDP source ports, or TCP/IPv4 IP ID values, reconstruct the internal PRNG state, then predict an outbound DNS query UDP source port, which speeds up the attack by a factor of x3000 to x6000. This attack works remotely, but can also be mounted locally, across Linux users and across containers, and (depending on the stub resolver) can poison the cache with an arbitrary DNS record. Additionally, we can identify and track Linux and Android devices – we collect TCP/IPv6 flow label values and/or UDP source port values and/or TCP/IPv4 ID fields, reconstruct the PRNG internal state and correlate this new state to previously extracted PRNG states to identify the same device.

我们分析了在Linux内核(Linux操作系统和Android的内核)中使用的随机伪随机数生成器(PRNG)，并证明了该PRNG是弱的。随机PRNG被Linux内核中的许多“消费者”使用。我们重点研究了网络层面的三个消费者——UDP源端口生成算法、IPv6流标签生成算法和IPv4 ID生成算法。有缺陷的随机PRNG由所有这些消费者共享，这使我们能够对Linux内核进行“跨层攻击”。在这些攻击中，我们从一个OSI层推断随机PRNG的内部状态，并使用它来预测另一个OSI层所使用的PRNG的值，或者将其与从其他协议推断的PRNG的内部状态相关联。使用这种方法，我们可以对Linux进行非常有效的DNS缓存投毒攻击。我们收集TCP/IPv6流标签值，或UDP源端口，或TCP/IPv4 IP ID值，重建内部PRNG状态，然后预测出DNS查询UDP源端口，这将攻击速度提高了x3000到x6000倍。这种攻击可以远程工作，但也可以跨Linux用户和跨容器在本地安装，并且(取决于存根解析器)可以使用任意DNS记录毒害缓存。此外，我们可以识别和跟踪Linux和Android设备-我们收集TCP/IPv6流标签值和/或UDP源端口值和/或TCP/IPv4 ID字段，重建PRNG内部状态，并将此新状态与先前提取的PRNG状态相关联，以识别相同的设备。

{"title":"Cross Layer Attacks and How to Use Them (for DNS Cache Poisoning, Device Tracking and More)","authors":"Amit Klein","doi":"10.1109/SP40001.2021.00054","DOIUrl":"https://doi.org/10.1109/SP40001.2021.00054","url":null,"abstract":"We analyze the prandom pseudo random number generator (PRNG) in use in the Linux kernel (which is the kernel of the Linux operating system, as well as of Android) and demonstrate that this PRNG is weak. The prandom PRNG is in use by many \"consumers\" in the Linux kernel. We focused on three consumers at the network level – the UDP source port generation algorithm, the IPv6 flow label generation algorithm and the IPv4 ID generation algorithm. The flawed prandom PRNG is shared by all these consumers, which enables us to mount \"cross layer attacks\" against the Linux kernel. In these attacks, we infer the internal state of the prandom PRNG from one OSI layer, and use it to either predict the values of the PRNG employed by the other OSI layer, or to correlate it to an internal state of the PRNG inferred from the other protocol.Using this approach we can mount a very efficient DNS cache poisoning attack against Linux. We collect TCP/IPv6 flow label values, or UDP source ports, or TCP/IPv4 IP ID values, reconstruct the internal PRNG state, then predict an outbound DNS query UDP source port, which speeds up the attack by a factor of x3000 to x6000. This attack works remotely, but can also be mounted locally, across Linux users and across containers, and (depending on the stub resolver) can poison the cache with an arbitrary DNS record. Additionally, we can identify and track Linux and Android devices – we collect TCP/IPv6 flow label values and/or UDP source port values and/or TCP/IPv4 ID fields, reconstruct the PRNG internal state and correlate this new state to previously extracted PRNG states to identify the same device.","PeriodicalId":6786,"journal":{"name":"2021 IEEE Symposium on Security and Privacy (SP)","volume":"15 1","pages":"1179-1196"},"PeriodicalIF":0.0,"publicationDate":"2020-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81170560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14