Pub Date : 2021-03-31DOI: 10.1109/SP40001.2021.00075
Wanzheng Zhu, Hongyu Gong, Rohan Bansal, Zachary Weinberg, Nicolas Christin, G. Fanti, S. Bhat
Fringe groups and organizations have a long history of using euphemisms—ordinary-sounding words with a secret meaning—to conceal what they are discussing. Nowadays, one common use of euphemisms is to evade content moderation policies enforced by social media platforms. Existing tools for enforcing policy automatically rely on keyword searches for words on a "ban list", but these are notoriously imprecise: even when limited to swearwords, they can still cause embarrassing false positives [1]. When a commonly used ordinary word acquires a euphemistic meaning, adding it to a keyword-based ban list is hopeless: consider "pot" (storage container or marijuana?) or "heater" (household appliance or firearm?) The current generation of social media companies instead hire staff to check posts manually, but this is expensive, inhumane, and not much more effective. It is usually apparent to a human moderator that a word is being used euphemistically, but they may not know what the secret meaning is, and therefore whether the message violates policy. Also, when a euphemism is banned, the group that used it need only invent another one, leaving moderators one step behind.This paper will demonstrate unsupervised algorithms that, by analyzing words in their sentence-level context, can both detect words being used euphemistically, and identify the secret meaning of each word. Compared to the existing state of the art, which uses context-free word embeddings, our algorithm for detecting euphemisms achieves 30–400% higher detection accuracies of unlabeled euphemisms in a text corpus. Our algorithm for revealing euphemistic meanings of words is the first of its kind, as far as we are aware. In the arms race between content moderators and policy evaders, our algorithms may help shift the balance in the direction of the moderators.
{"title":"Self-Supervised Euphemism Detection and Identification for Content Moderation","authors":"Wanzheng Zhu, Hongyu Gong, Rohan Bansal, Zachary Weinberg, Nicolas Christin, G. Fanti, S. Bhat","doi":"10.1109/SP40001.2021.00075","DOIUrl":"https://doi.org/10.1109/SP40001.2021.00075","url":null,"abstract":"Fringe groups and organizations have a long history of using euphemisms—ordinary-sounding words with a secret meaning—to conceal what they are discussing. Nowadays, one common use of euphemisms is to evade content moderation policies enforced by social media platforms. Existing tools for enforcing policy automatically rely on keyword searches for words on a \"ban list\", but these are notoriously imprecise: even when limited to swearwords, they can still cause embarrassing false positives [1]. When a commonly used ordinary word acquires a euphemistic meaning, adding it to a keyword-based ban list is hopeless: consider \"pot\" (storage container or marijuana?) or \"heater\" (household appliance or firearm?) The current generation of social media companies instead hire staff to check posts manually, but this is expensive, inhumane, and not much more effective. It is usually apparent to a human moderator that a word is being used euphemistically, but they may not know what the secret meaning is, and therefore whether the message violates policy. Also, when a euphemism is banned, the group that used it need only invent another one, leaving moderators one step behind.This paper will demonstrate unsupervised algorithms that, by analyzing words in their sentence-level context, can both detect words being used euphemistically, and identify the secret meaning of each word. Compared to the existing state of the art, which uses context-free word embeddings, our algorithm for detecting euphemisms achieves 30–400% higher detection accuracies of unlabeled euphemisms in a text corpus. Our algorithm for revealing euphemistic meanings of words is the first of its kind, as far as we are aware. In the arms race between content moderators and policy evaders, our algorithms may help shift the balance in the direction of the moderators.","PeriodicalId":6786,"journal":{"name":"2021 IEEE Symposium on Security and Privacy (SP)","volume":"4 1","pages":"229-246"},"PeriodicalIF":0.0,"publicationDate":"2021-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88324330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-15DOI: 10.1109/SP40001.2021.00084
Ethan Cecchetti, Siqiu Yao, Haobin Ni, A. Myers
The disastrous vulnerabilities in smart contracts sharply remind us of our ignorance: we do not know how to write code that is secure in composition with malicious code. Information flow control has long been proposed as a way to achieve compositional security, offering strong guarantees even when combining software from different trust domains. Unfortunately, this appealing story breaks down in the presence of reentrancy attacks. We formalize a general definition of reentrancy and introduce a security condition that allows software modules like smart contracts to protect their key invariants while retaining the expressive power of safe forms of reentrancy. We present a security type system that provably enforces secure information flow; in conjunction with run-time mechanisms, it enforces secure reentrancy even in the presence of unknown code; and it helps locate and correct recent high-profile vulnerabilities.
{"title":"Compositional Security for Reentrant Applications","authors":"Ethan Cecchetti, Siqiu Yao, Haobin Ni, A. Myers","doi":"10.1109/SP40001.2021.00084","DOIUrl":"https://doi.org/10.1109/SP40001.2021.00084","url":null,"abstract":"The disastrous vulnerabilities in smart contracts sharply remind us of our ignorance: we do not know how to write code that is secure in composition with malicious code. Information flow control has long been proposed as a way to achieve compositional security, offering strong guarantees even when combining software from different trust domains. Unfortunately, this appealing story breaks down in the presence of reentrancy attacks. We formalize a general definition of reentrancy and introduce a security condition that allows software modules like smart contracts to protect their key invariants while retaining the expressive power of safe forms of reentrancy. We present a security type system that provably enforces secure information flow; in conjunction with run-time mechanisms, it enforces secure reentrancy even in the presence of unknown code; and it helps locate and correct recent high-profile vulnerabilities.","PeriodicalId":6786,"journal":{"name":"2021 IEEE Symposium on Security and Privacy (SP)","volume":"49 1","pages":"1249-1267"},"PeriodicalIF":0.0,"publicationDate":"2021-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78177306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-09DOI: 10.1109/SP40001.2021.00106
Hengrui Jia, Mohammad Yaghini, Christopher A. Choquette-Choo, Natalie Dullerud, Anvith Thudi, Varun Chandrasekaran, Nicolas Papernot
Training machine learning (ML) models typically involves expensive iterative optimization. Once the model’s final parameters are released, there is currently no mechanism for the entity which trained the model to prove that these parameters were indeed the result of this optimization procedure. Such a mechanism would support security of ML applications in several ways. For instance, it would simplify ownership resolution when multiple parties contest ownership of a specific model. It would also facilitate the distributed training across untrusted workers where Byzantine workers might otherwise mount a denial-ofservice by returning incorrect model updates.In this paper, we remediate this problem by introducing the concept of proof-of-learning in ML. Inspired by research on both proof-of-work and verified computations, we observe how a seminal training algorithm, stochastic gradient descent, accumulates secret information due to its stochasticity. This produces a natural construction for a proof-of-learning which demonstrates that a party has expended the compute require to obtain a set of model parameters correctly. In particular, our analyses and experiments show that an adversary seeking to illegitimately manufacture a proof-of-learning needs to perform at least as much work than is needed for gradient descent itself.We also instantiate a concrete proof-of-learning mechanism in both of the scenarios described above. In model ownership resolution, it protects the intellectual property of models released publicly. In distributed training, it preserves availability of the training procedure. Our empirical evaluation validates that our proof-of-learning mechanism is robust to variance induced by the hardware (e.g., ML accelerators) and software stacks.
{"title":"Proof-of-Learning: Definitions and Practice","authors":"Hengrui Jia, Mohammad Yaghini, Christopher A. Choquette-Choo, Natalie Dullerud, Anvith Thudi, Varun Chandrasekaran, Nicolas Papernot","doi":"10.1109/SP40001.2021.00106","DOIUrl":"https://doi.org/10.1109/SP40001.2021.00106","url":null,"abstract":"Training machine learning (ML) models typically involves expensive iterative optimization. Once the model’s final parameters are released, there is currently no mechanism for the entity which trained the model to prove that these parameters were indeed the result of this optimization procedure. Such a mechanism would support security of ML applications in several ways. For instance, it would simplify ownership resolution when multiple parties contest ownership of a specific model. It would also facilitate the distributed training across untrusted workers where Byzantine workers might otherwise mount a denial-ofservice by returning incorrect model updates.In this paper, we remediate this problem by introducing the concept of proof-of-learning in ML. Inspired by research on both proof-of-work and verified computations, we observe how a seminal training algorithm, stochastic gradient descent, accumulates secret information due to its stochasticity. This produces a natural construction for a proof-of-learning which demonstrates that a party has expended the compute require to obtain a set of model parameters correctly. In particular, our analyses and experiments show that an adversary seeking to illegitimately manufacture a proof-of-learning needs to perform at least as much work than is needed for gradient descent itself.We also instantiate a concrete proof-of-learning mechanism in both of the scenarios described above. In model ownership resolution, it protects the intellectual property of models released publicly. In distributed training, it preserves availability of the training procedure. Our empirical evaluation validates that our proof-of-learning mechanism is robust to variance induced by the hardware (e.g., ML accelerators) and software stacks.","PeriodicalId":6786,"journal":{"name":"2021 IEEE Symposium on Security and Privacy (SP)","volume":"87 1","pages":"1039-1056"},"PeriodicalIF":0.0,"publicationDate":"2021-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76096967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-03DOI: 10.1109/SP40001.2021.00113
Liyi Zhou, Kaihua Qin, Antoine Cully, B. Livshits, Arthur Gervais
Decentralized Finance (DeFi) is a blockchain-asset-enabled finance ecosystem with millions of daily USD transaction volume, billions of locked up USD, as well as a plethora of newly emerging protocols (for lending, staking, and exchanges). Because all transactions, user balances, and total value locked in DeFi are publicly readable, a natural question that arises is: how can we automatically craft profitable transactions across the intertwined DeFi platforms?In this paper, we investigate two methods that allow us to automatically create profitable DeFi trades, one well-suited to arbitrage and the other applicable to more complicated settings. We first adopt the Bellman-Ford-Moore algorithm with DeFiPoser-ARB and then create logical DeFi protocol models for a theorem prover in DeFiPoser-SMT. While DeFiPoser-ARB focuses on DeFi transactions that form a cycle and performs very well for arbitrage, DeFiPoser-SMT can detect more complicated profitable transactions. We estimate that DeFiPoser-ARB and DeFiPoser-SMT can generate an average weekly revenue of 191.48 ETH (76,592 USD) and 72.44 ETH (28,976 USD) respectively, with the highest transaction revenue being 81.31 ETH (32,524 USD) and 22.40 ETH (8,960 USD) respectively. We further show that DeFiPoser-SMT finds the known economic bZx attack from February 2020, which yields 0.48M USD. Our forensic investigations show that this opportunity existed for 69 days and could have yielded more revenue if exploited one day earlier. Our evaluation spans 150 days, given 96 DeFi protocol actions, and 25 assets.Looking beyond the financial gains mentioned above, forks deteriorate the blockchain consensus security, as they increase the risks of double-spending and selfish mining. We explore the implications of DeFiPoser-ARB and DeFiPoser-SMT on blockchain consensus. Specifically, we show that the trades identified by our tools exceed the Ethereum block reward by up to 874×. Given optimal adversarial strategies provided by a Markov Decision Process (MDP), we quantify the value threshold at which a profitable transaction qualifies as Miner Extractable Value (MEV) and would incentivize MEV-aware miners to fork the blockchain. For instance, we find that on Ethereum, a miner with a hash rate of 10% would fork the blockchain if an MEV opportunity exceeds 4× the block reward.
去中心化金融(DeFi)是一个支持区块链资产的金融生态系统,每天有数百万美元的交易量,数十亿美元的锁定美元,以及大量新兴的协议(用于贷款,投资和交换)。由于DeFi中锁定的所有交易、用户余额和总价值都是公开可读的,因此自然会出现这样一个问题:我们如何在相互交织的DeFi平台上自动创建有利可图的交易?在本文中,我们研究了两种允许我们自动创建有利可图的DeFi交易的方法,一种非常适合套利,另一种适用于更复杂的设置。我们首先在defposer - arb中采用Bellman-Ford-Moore算法,然后在defposer - smt中为定理证明器创建逻辑DeFi协议模型。defposer - arb侧重于形成周期的DeFi交易,并且对套利执行得非常好,而defposer - smt可以检测更复杂的有利可图的交易。我们估计defposer - arb和defposer - smt的平均周收入分别为191.48 ETH (76,592 USD)和72.44 ETH (28,976 USD),最高交易收入分别为81.31 ETH (32,524 USD)和22.40 ETH (8,960 USD)。我们进一步表明,defposer - smt发现了2020年2月以来已知的经济bZx攻击,该攻击产生了48万美元。我们的法医调查显示,这个机会存在了69天,如果早一天利用,可能会产生更多的收入。我们的评估持续了150天,给出了96个DeFi协议动作和25个资产。除了上面提到的经济收益之外,分叉还会恶化区块链共识的安全性,因为它们增加了双重支出和自私挖矿的风险。我们探讨了defposer - arb和defposer - smt对区块链共识的影响。具体来说,我们表明,我们的工具识别的交易超过了以太坊区块奖励高达874x。考虑到马尔可夫决策过程(MDP)提供的最优对抗策略,我们量化了一个有利可图的交易符合矿工可提取价值(MEV)的价值阈值,并将激励意识到MEV的矿工分叉区块链。例如,我们发现在以太坊上,如果MEV机会超过区块奖励的4倍,哈希率为10%的矿工将分叉区块链。
{"title":"On the Just-In-Time Discovery of Profit-Generating Transactions in DeFi Protocols","authors":"Liyi Zhou, Kaihua Qin, Antoine Cully, B. Livshits, Arthur Gervais","doi":"10.1109/SP40001.2021.00113","DOIUrl":"https://doi.org/10.1109/SP40001.2021.00113","url":null,"abstract":"Decentralized Finance (DeFi) is a blockchain-asset-enabled finance ecosystem with millions of daily USD transaction volume, billions of locked up USD, as well as a plethora of newly emerging protocols (for lending, staking, and exchanges). Because all transactions, user balances, and total value locked in DeFi are publicly readable, a natural question that arises is: how can we automatically craft profitable transactions across the intertwined DeFi platforms?In this paper, we investigate two methods that allow us to automatically create profitable DeFi trades, one well-suited to arbitrage and the other applicable to more complicated settings. We first adopt the Bellman-Ford-Moore algorithm with DeFiPoser-ARB and then create logical DeFi protocol models for a theorem prover in DeFiPoser-SMT. While DeFiPoser-ARB focuses on DeFi transactions that form a cycle and performs very well for arbitrage, DeFiPoser-SMT can detect more complicated profitable transactions. We estimate that DeFiPoser-ARB and DeFiPoser-SMT can generate an average weekly revenue of 191.48 ETH (76,592 USD) and 72.44 ETH (28,976 USD) respectively, with the highest transaction revenue being 81.31 ETH (32,524 USD) and 22.40 ETH (8,960 USD) respectively. We further show that DeFiPoser-SMT finds the known economic bZx attack from February 2020, which yields 0.48M USD. Our forensic investigations show that this opportunity existed for 69 days and could have yielded more revenue if exploited one day earlier. Our evaluation spans 150 days, given 96 DeFi protocol actions, and 25 assets.Looking beyond the financial gains mentioned above, forks deteriorate the blockchain consensus security, as they increase the risks of double-spending and selfish mining. We explore the implications of DeFiPoser-ARB and DeFiPoser-SMT on blockchain consensus. Specifically, we show that the trades identified by our tools exceed the Ethereum block reward by up to 874×. Given optimal adversarial strategies provided by a Markov Decision Process (MDP), we quantify the value threshold at which a profitable transaction qualifies as Miner Extractable Value (MEV) and would incentivize MEV-aware miners to fork the blockchain. For instance, we find that on Ethereum, a miner with a hash rate of 10% would fork the blockchain if an MEV opportunity exceeds 4× the block reward.","PeriodicalId":6786,"journal":{"name":"2021 IEEE Symposium on Security and Privacy (SP)","volume":"10 1","pages":"919-936"},"PeriodicalIF":0.0,"publicationDate":"2021-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84505002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-18DOI: 10.1109/SP40001.2021.00068
Alexander Viand, Patrick Jattke, Anwar Hithnawi
Fully Homomorphic Encryption (FHE) allows a third party to perform arbitrary computations on encrypted data, learning neither the inputs nor the computation results. Hence, it provides resilience in situations where computations are carried out by an untrusted or potentially compromised party. This powerful concept was first conceived by Rivest et al. in the 1970s. However, it remained unrealized until Craig Gentry presented the first feasible FHE scheme in 2009.The advent of the massive collection of sensitive data in cloud services, coupled with a plague of data breaches, moved highly regulated businesses to increasingly demand confidential and secure computing solutions. This demand, in turn, has led to a recent surge in the development of FHE tools. To understand the landscape of recent FHE tool developments, we conduct an extensive survey and experimental evaluation to explore the current state of the art and identify areas for future development.In this paper, we survey, evaluate, and systematize FHE tools and compilers. We perform experiments to evaluate these tools’ performance and usability aspects on a variety of applications. We conclude with recommendations for developers intending to develop FHE-based applications and a discussion on future directions for FHE tools development.
{"title":"SoK: Fully Homomorphic Encryption Compilers","authors":"Alexander Viand, Patrick Jattke, Anwar Hithnawi","doi":"10.1109/SP40001.2021.00068","DOIUrl":"https://doi.org/10.1109/SP40001.2021.00068","url":null,"abstract":"Fully Homomorphic Encryption (FHE) allows a third party to perform arbitrary computations on encrypted data, learning neither the inputs nor the computation results. Hence, it provides resilience in situations where computations are carried out by an untrusted or potentially compromised party. This powerful concept was first conceived by Rivest et al. in the 1970s. However, it remained unrealized until Craig Gentry presented the first feasible FHE scheme in 2009.The advent of the massive collection of sensitive data in cloud services, coupled with a plague of data breaches, moved highly regulated businesses to increasingly demand confidential and secure computing solutions. This demand, in turn, has led to a recent surge in the development of FHE tools. To understand the landscape of recent FHE tool developments, we conduct an extensive survey and experimental evaluation to explore the current state of the art and identify areas for future development.In this paper, we survey, evaluate, and systematize FHE tools and compilers. We perform experiments to evaluate these tools’ performance and usability aspects on a variety of applications. We conclude with recommendations for developers intending to develop FHE-based applications and a discussion on future directions for FHE tools development.","PeriodicalId":6786,"journal":{"name":"2021 IEEE Symposium on Security and Privacy (SP)","volume":"105 1","pages":"1092-1108"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85830037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-11DOI: 10.1109/SP40001.2021.00069
Milad Nasr, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Nicholas Carlini
Differentially private (DP) machine learning allows us to train models on private data while limiting data leakage. DP formalizes this data leakage through a cryptographic game, where an adversary must predict if a model was trained on a dataset D, or a dataset D′ that differs in just one example. If observing the training algorithm does not meaningfully increase the adversary's odds of successfully guessing which dataset the model was trained on, then the algorithm is said to be differentially private. Hence, the purpose of privacy analysis is to upper bound the probability that any adversary could successfully guess which dataset the model was trained on.In our paper, we instantiate this hypothetical adversary in order to establish lower bounds on the probability that this distinguishing game can be won. We use this adversary to evaluate the importance of the adversary capabilities allowed in the privacy analysis of DP training algorithms.For DP-SGD, the most common method for training neural networks with differential privacy, our lower bounds are tight and match the theoretical upper bound. This implies that in order to prove better upper bounds, it will be necessary to make use of additional assumptions. Fortunately, we find that our attacks are significantly weaker when additional (realistic) restrictions are put in place on the adversary's capabilities. Thus, in the practical setting common to many real-world deployments, there is a gap between our lower bounds and the upper bounds provided by the analysis: differential privacy is conservative and adversaries may not be able to leak as much information as suggested by the theoretical bound.
{"title":"Adversary Instantiation: Lower Bounds for Differentially Private Machine Learning","authors":"Milad Nasr, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Nicholas Carlini","doi":"10.1109/SP40001.2021.00069","DOIUrl":"https://doi.org/10.1109/SP40001.2021.00069","url":null,"abstract":"Differentially private (DP) machine learning allows us to train models on private data while limiting data leakage. DP formalizes this data leakage through a cryptographic game, where an adversary must predict if a model was trained on a dataset D, or a dataset D′ that differs in just one example. If observing the training algorithm does not meaningfully increase the adversary's odds of successfully guessing which dataset the model was trained on, then the algorithm is said to be differentially private. Hence, the purpose of privacy analysis is to upper bound the probability that any adversary could successfully guess which dataset the model was trained on.In our paper, we instantiate this hypothetical adversary in order to establish lower bounds on the probability that this distinguishing game can be won. We use this adversary to evaluate the importance of the adversary capabilities allowed in the privacy analysis of DP training algorithms.For DP-SGD, the most common method for training neural networks with differential privacy, our lower bounds are tight and match the theoretical upper bound. This implies that in order to prove better upper bounds, it will be necessary to make use of additional assumptions. Fortunately, we find that our attacks are significantly weaker when additional (realistic) restrictions are put in place on the adversary's capabilities. Thus, in the practical setting common to many real-world deployments, there is a gap between our lower bounds and the upper bounds provided by the analysis: differential privacy is conservative and adversaries may not be able to leak as much information as suggested by the theoretical bound.","PeriodicalId":6786,"journal":{"name":"2021 IEEE Symposium on Security and Privacy (SP)","volume":"37 1","pages":"866-882"},"PeriodicalIF":0.0,"publicationDate":"2021-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84688382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-06DOI: 10.1109/SP40001.2021.00057
T. D. Nguyen, Long H. Pham, Jun Sun
Smart contracts are distributed, self-enforcing programs executing on top of blockchain networks. They have the potential to revolutionize many industries such as financial institutes and supply chains. However, smart contracts are subject to code-based vulnerabilities, which casts a shadow on its applications. As smart contracts are unpatchable (due to the immutability of blockchain), it is essential that smart contracts are guaranteed to be free of vulnerabilities. Unfortunately, smart contract languages such as Solidity are Turing-complete, which implies that verifying them statically is infeasible. Thus, alternative approaches must be developed to provide the guarantee. In this work, we develop an approach which automatically transforms smart contracts so that they are provably free of 4 common kinds of vulnerabilities. The key idea is to apply run-time verification in an efficient and provably correct manner. Experiment results with 5000 smart contracts show that our approach incurs minor run-time overhead in terms of time (i.e., 14.79%) and gas (i.e., 0.79%).
{"title":"SGUARD: Towards Fixing Vulnerable Smart Contracts Automatically","authors":"T. D. Nguyen, Long H. Pham, Jun Sun","doi":"10.1109/SP40001.2021.00057","DOIUrl":"https://doi.org/10.1109/SP40001.2021.00057","url":null,"abstract":"Smart contracts are distributed, self-enforcing programs executing on top of blockchain networks. They have the potential to revolutionize many industries such as financial institutes and supply chains. However, smart contracts are subject to code-based vulnerabilities, which casts a shadow on its applications. As smart contracts are unpatchable (due to the immutability of blockchain), it is essential that smart contracts are guaranteed to be free of vulnerabilities. Unfortunately, smart contract languages such as Solidity are Turing-complete, which implies that verifying them statically is infeasible. Thus, alternative approaches must be developed to provide the guarantee. In this work, we develop an approach which automatically transforms smart contracts so that they are provably free of 4 common kinds of vulnerabilities. The key idea is to apply run-time verification in an efficient and provably correct manner. Experiment results with 5000 smart contracts show that our approach incurs minor run-time overhead in terms of time (i.e., 14.79%) and gas (i.e., 0.79%).","PeriodicalId":6786,"journal":{"name":"2021 IEEE Symposium on Security and Privacy (SP)","volume":"74 1","pages":"1215-1229"},"PeriodicalIF":0.0,"publicationDate":"2021-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90980611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-04DOI: 10.1109/SP40001.2021.00060
Subhajit Roy, Justin Hsu, Aws Albarghouthi
Differential privacy is a formal, mathematical definition of data privacy that has gained traction in academia, industry, and government. The task of correctly constructing differentially private algorithms is non-trivial, and mistakes have been made in foundational algorithms. Currently, there is no automated support for converting an existing, non-private program into a differentially private version. In this paper, we propose a technique for automatically learning an accurate and differentially private version of a given non-private program. We show how to solve this difficult program synthesis problem via a combination of techniques: carefully picking representative example inputs, reducing the problem to continuous optimization, and mapping the results back to symbolic expressions. We demonstrate that our approach is able to learn foundational algorithms from the differential privacy literature and significantly outperforms natural program synthesis baselines.
{"title":"Learning Differentially Private Mechanisms","authors":"Subhajit Roy, Justin Hsu, Aws Albarghouthi","doi":"10.1109/SP40001.2021.00060","DOIUrl":"https://doi.org/10.1109/SP40001.2021.00060","url":null,"abstract":"Differential privacy is a formal, mathematical definition of data privacy that has gained traction in academia, industry, and government. The task of correctly constructing differentially private algorithms is non-trivial, and mistakes have been made in foundational algorithms. Currently, there is no automated support for converting an existing, non-private program into a differentially private version. In this paper, we propose a technique for automatically learning an accurate and differentially private version of a given non-private program. We show how to solve this difficult program synthesis problem via a combination of techniques: carefully picking representative example inputs, reducing the problem to continuous optimization, and mapping the results back to symbolic expressions. We demonstrate that our approach is able to learn foundational algorithms from the differential privacy literature and significantly outperforms natural program synthesis baselines.","PeriodicalId":6786,"journal":{"name":"2021 IEEE Symposium on Security and Privacy (SP)","volume":"238 1","pages":"852-865"},"PeriodicalIF":0.0,"publicationDate":"2021-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72744218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-29DOI: 10.1109/SP40001.2021.00048
D. Boneh, Elette Boyle, Henry Corrigan-Gibbs, N. Gilboa, Y. Ishai
This paper presents a new protocol for solving the private heavy-hitters problem. In this problem, there are many clients and a small set of data-collection servers. Each client holds a private bitstring. The servers want to recover the set of all popular strings, without learning anything else about any client’s string. A web-browser vendor, for instance, can use our protocol to figure out which homepages are popular, without learning any user’s homepage. We also consider the simpler private subset-histogram problem, in which the servers want to count how many clients hold strings in a particular set without revealing this set to the clients.Our protocols use two data-collection servers and, in a protocol run, each client send sends only a single message to the servers. Our protocols protect client privacy against arbitrary misbehavior by one of the servers and our approach requires no public-key cryptography (except for secure channels), nor general-purpose multiparty computation. Instead, we rely on incremental distributed point functions, a new cryptographic tool that allows a client to succinctly secret-share the labels on the nodes of an exponentially large binary tree, provided that the tree has a single non-zero path. Along the way, we develop new general tools for providing malicious security in applications of distributed point functions.A limitation of our heavy-hitters protocol is that it reveals to the servers slightly more information than the set of popular strings itself. We precisely define and quantify this leakage and explain how to ameliorate its effects. In an experimental evaluation with two servers on opposite sides of the U.S., the servers can find the 200 most popular strings among a set of 400,000 client-held 256-bit strings in 54 minutes. Our protocols are highly parallelizable. We estimate that with 20 physical machines per logical server, our protocols could compute heavy hitters over ten million clients in just over one hour of computation.
{"title":"Lightweight Techniques for Private Heavy Hitters","authors":"D. Boneh, Elette Boyle, Henry Corrigan-Gibbs, N. Gilboa, Y. Ishai","doi":"10.1109/SP40001.2021.00048","DOIUrl":"https://doi.org/10.1109/SP40001.2021.00048","url":null,"abstract":"This paper presents a new protocol for solving the private heavy-hitters problem. In this problem, there are many clients and a small set of data-collection servers. Each client holds a private bitstring. The servers want to recover the set of all popular strings, without learning anything else about any client’s string. A web-browser vendor, for instance, can use our protocol to figure out which homepages are popular, without learning any user’s homepage. We also consider the simpler private subset-histogram problem, in which the servers want to count how many clients hold strings in a particular set without revealing this set to the clients.Our protocols use two data-collection servers and, in a protocol run, each client send sends only a single message to the servers. Our protocols protect client privacy against arbitrary misbehavior by one of the servers and our approach requires no public-key cryptography (except for secure channels), nor general-purpose multiparty computation. Instead, we rely on incremental distributed point functions, a new cryptographic tool that allows a client to succinctly secret-share the labels on the nodes of an exponentially large binary tree, provided that the tree has a single non-zero path. Along the way, we develop new general tools for providing malicious security in applications of distributed point functions.A limitation of our heavy-hitters protocol is that it reveals to the servers slightly more information than the set of popular strings itself. We precisely define and quantify this leakage and explain how to ameliorate its effects. In an experimental evaluation with two servers on opposite sides of the U.S., the servers can find the 200 most popular strings among a set of 400,000 client-held 256-bit strings in 54 minutes. Our protocols are highly parallelizable. We estimate that with 20 physical machines per logical server, our protocols could compute heavy hitters over ten million clients in just over one hour of computation.","PeriodicalId":6786,"journal":{"name":"2021 IEEE Symposium on Security and Privacy (SP)","volume":"56 1","pages":"762-776"},"PeriodicalIF":0.0,"publicationDate":"2020-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75470417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-14DOI: 10.1109/SP40001.2021.00054
Amit Klein
We analyze the prandom pseudo random number generator (PRNG) in use in the Linux kernel (which is the kernel of the Linux operating system, as well as of Android) and demonstrate that this PRNG is weak. The prandom PRNG is in use by many "consumers" in the Linux kernel. We focused on three consumers at the network level – the UDP source port generation algorithm, the IPv6 flow label generation algorithm and the IPv4 ID generation algorithm. The flawed prandom PRNG is shared by all these consumers, which enables us to mount "cross layer attacks" against the Linux kernel. In these attacks, we infer the internal state of the prandom PRNG from one OSI layer, and use it to either predict the values of the PRNG employed by the other OSI layer, or to correlate it to an internal state of the PRNG inferred from the other protocol.Using this approach we can mount a very efficient DNS cache poisoning attack against Linux. We collect TCP/IPv6 flow label values, or UDP source ports, or TCP/IPv4 IP ID values, reconstruct the internal PRNG state, then predict an outbound DNS query UDP source port, which speeds up the attack by a factor of x3000 to x6000. This attack works remotely, but can also be mounted locally, across Linux users and across containers, and (depending on the stub resolver) can poison the cache with an arbitrary DNS record. Additionally, we can identify and track Linux and Android devices – we collect TCP/IPv6 flow label values and/or UDP source port values and/or TCP/IPv4 ID fields, reconstruct the PRNG internal state and correlate this new state to previously extracted PRNG states to identify the same device.
我们分析了在Linux内核(Linux操作系统和Android的内核)中使用的随机伪随机数生成器(PRNG),并证明了该PRNG是弱的。随机PRNG被Linux内核中的许多“消费者”使用。我们重点研究了网络层面的三个消费者——UDP源端口生成算法、IPv6流标签生成算法和IPv4 ID生成算法。有缺陷的随机PRNG由所有这些消费者共享,这使我们能够对Linux内核进行“跨层攻击”。在这些攻击中,我们从一个OSI层推断随机PRNG的内部状态,并使用它来预测另一个OSI层所使用的PRNG的值,或者将其与从其他协议推断的PRNG的内部状态相关联。使用这种方法,我们可以对Linux进行非常有效的DNS缓存投毒攻击。我们收集TCP/IPv6流标签值,或UDP源端口,或TCP/IPv4 IP ID值,重建内部PRNG状态,然后预测出DNS查询UDP源端口,这将攻击速度提高了x3000到x6000倍。这种攻击可以远程工作,但也可以跨Linux用户和跨容器在本地安装,并且(取决于存根解析器)可以使用任意DNS记录毒害缓存。此外,我们可以识别和跟踪Linux和Android设备-我们收集TCP/IPv6流标签值和/或UDP源端口值和/或TCP/IPv4 ID字段,重建PRNG内部状态,并将此新状态与先前提取的PRNG状态相关联,以识别相同的设备。
{"title":"Cross Layer Attacks and How to Use Them (for DNS Cache Poisoning, Device Tracking and More)","authors":"Amit Klein","doi":"10.1109/SP40001.2021.00054","DOIUrl":"https://doi.org/10.1109/SP40001.2021.00054","url":null,"abstract":"We analyze the prandom pseudo random number generator (PRNG) in use in the Linux kernel (which is the kernel of the Linux operating system, as well as of Android) and demonstrate that this PRNG is weak. The prandom PRNG is in use by many \"consumers\" in the Linux kernel. We focused on three consumers at the network level – the UDP source port generation algorithm, the IPv6 flow label generation algorithm and the IPv4 ID generation algorithm. The flawed prandom PRNG is shared by all these consumers, which enables us to mount \"cross layer attacks\" against the Linux kernel. In these attacks, we infer the internal state of the prandom PRNG from one OSI layer, and use it to either predict the values of the PRNG employed by the other OSI layer, or to correlate it to an internal state of the PRNG inferred from the other protocol.Using this approach we can mount a very efficient DNS cache poisoning attack against Linux. We collect TCP/IPv6 flow label values, or UDP source ports, or TCP/IPv4 IP ID values, reconstruct the internal PRNG state, then predict an outbound DNS query UDP source port, which speeds up the attack by a factor of x3000 to x6000. This attack works remotely, but can also be mounted locally, across Linux users and across containers, and (depending on the stub resolver) can poison the cache with an arbitrary DNS record. Additionally, we can identify and track Linux and Android devices – we collect TCP/IPv6 flow label values and/or UDP source port values and/or TCP/IPv4 ID fields, reconstruct the PRNG internal state and correlate this new state to previously extracted PRNG states to identify the same device.","PeriodicalId":6786,"journal":{"name":"2021 IEEE Symposium on Security and Privacy (SP)","volume":"15 1","pages":"1179-1196"},"PeriodicalIF":0.0,"publicationDate":"2020-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81170560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}