Pub Date : 2023-05-01DOI: 10.1109/SP46215.2023.00171
Ce Feng, Nuo Xu, Wujie Wen, P. Venkitasubramaniam, Caiwen Ding
Differential privacy is a widely accepted measure of privacy in the context of deep learning algorithms, and achieving it relies on a noisy training approach known as differentially private stochastic gradient descent (DP-SGD). DP-SGD requires direct noise addition to every gradient in a dense neural network, the privacy is achieved at a significant utility cost. In this work, we present Spectral-DP, a new differentially private learning approach which combines gradient perturbation in the spectral domain with spectral filtering to achieve a desired privacy guarantee with a lower noise scale and thus better utility. We develop differentially private deep learning methods based on Spectral-DP for architectures that contain both convolution and fully connected layers. In particular, for fully connected layers, we combine a block-circulant based spatial restructuring with Spectral-DP to achieve better utility. Through comprehensive experiments, we study and provide guidelines to implement Spectral-DP deep learning on benchmark datasets. In comparison with state-of-the-art DP-SGD based approaches, Spectral-DP is shown to have uniformly better utility performance in both training from scratch and transfer learning settings.
{"title":"Spectral-DP: Differentially Private Deep Learning through Spectral Perturbation and Filtering","authors":"Ce Feng, Nuo Xu, Wujie Wen, P. Venkitasubramaniam, Caiwen Ding","doi":"10.1109/SP46215.2023.00171","DOIUrl":"https://doi.org/10.1109/SP46215.2023.00171","url":null,"abstract":"Differential privacy is a widely accepted measure of privacy in the context of deep learning algorithms, and achieving it relies on a noisy training approach known as differentially private stochastic gradient descent (DP-SGD). DP-SGD requires direct noise addition to every gradient in a dense neural network, the privacy is achieved at a significant utility cost. In this work, we present Spectral-DP, a new differentially private learning approach which combines gradient perturbation in the spectral domain with spectral filtering to achieve a desired privacy guarantee with a lower noise scale and thus better utility. We develop differentially private deep learning methods based on Spectral-DP for architectures that contain both convolution and fully connected layers. In particular, for fully connected layers, we combine a block-circulant based spatial restructuring with Spectral-DP to achieve better utility. Through comprehensive experiments, we study and provide guidelines to implement Spectral-DP deep learning on benchmark datasets. In comparison with state-of-the-art DP-SGD based approaches, Spectral-DP is shown to have uniformly better utility performance in both training from scratch and transfer learning settings.","PeriodicalId":439989,"journal":{"name":"2023 IEEE Symposium on Security and Privacy (SP)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126179109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-01DOI: 10.1109/SP46215.2023.10179287
Jinyang Ding, Kejiang Chen, Yaofei Wang, Na Zhao, Weiming Zhang, Neng H. Yu
Steganography is the act of disguising the transmission of secret information as seemingly innocent. Although provably secure steganography has been proposed for decades, it has not been mainstream in this field because its strict requirements (such as a perfect sampler and an explicit data distribution) are challenging to satisfy in traditional data environments. The popularity of deep generative models is gradually increasing and can provide an excellent opportunity to solve this problem. Several methods attempting to achieve provably secure steganography based on deep generative models have been proposed in recent years. However, they cannot achieve the expected security in practice due to unrealistic conditions, such as the balanced grouping of discrete elements and a perfect match between the message and channel distributions. In this paper, we propose a new provably secure steganography method in practice named Discop, which constructs several "distribution copies" during the generation process. At each time step of generation, the message determines from which "distribution copy" to sample. As long as the receiver agrees on some shared information with the sender, he can extract the message without error. To further improve the embedding rate, we recursively construct more "distribution copies" by creating Huffman trees. We prove that Discop can strictly maintain the original distribution so that the adversary cannot perform better than random guessing. Moreover, we conduct experiments on multiple generation tasks for diverse digital media, and the results show that Discop’s security and efficiency outperform those of previous methods.
{"title":"Discop: Provably Secure Steganography in Practice Based on \"Distribution Copies\"","authors":"Jinyang Ding, Kejiang Chen, Yaofei Wang, Na Zhao, Weiming Zhang, Neng H. Yu","doi":"10.1109/SP46215.2023.10179287","DOIUrl":"https://doi.org/10.1109/SP46215.2023.10179287","url":null,"abstract":"Steganography is the act of disguising the transmission of secret information as seemingly innocent. Although provably secure steganography has been proposed for decades, it has not been mainstream in this field because its strict requirements (such as a perfect sampler and an explicit data distribution) are challenging to satisfy in traditional data environments. The popularity of deep generative models is gradually increasing and can provide an excellent opportunity to solve this problem. Several methods attempting to achieve provably secure steganography based on deep generative models have been proposed in recent years. However, they cannot achieve the expected security in practice due to unrealistic conditions, such as the balanced grouping of discrete elements and a perfect match between the message and channel distributions. In this paper, we propose a new provably secure steganography method in practice named Discop, which constructs several \"distribution copies\" during the generation process. At each time step of generation, the message determines from which \"distribution copy\" to sample. As long as the receiver agrees on some shared information with the sender, he can extract the message without error. To further improve the embedding rate, we recursively construct more \"distribution copies\" by creating Huffman trees. We prove that Discop can strictly maintain the original distribution so that the adversary cannot perform better than random guessing. Moreover, we conduct experiments on multiple generation tasks for diverse digital media, and the results show that Discop’s security and efficiency outperform those of previous methods.","PeriodicalId":439989,"journal":{"name":"2023 IEEE Symposium on Security and Privacy (SP)","volume":"239 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122933380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-01DOI: 10.1109/SP46215.2023.10179429
Zhihao Wu, Yushi Cheng, Jiahui Yang, Xiaoyu Ji, Wenyuan Xu
Face authentication has been widely used in access control, and the latest 3D face authentication systems employ 3D liveness detection techniques to cope with the photo replay attacks, whereby an attacker uses a 2D photo to bypass the authentication. In this paper, we analyze the security of 3D liveness detection systems that utilize structured light depth cameras and discover a new attack surface against 3D face authentication systems. We propose DepthFake attacks that can spoof a 3D face authentication using only one single 2D photo. To achieve this goal, DepthFake first estimates the 3D depth information of a target victim’s face from his 2D photo. Then, DepthFake projects the carefully-crafted scatter patterns embedded with the face depth information, in order to empower the 2D photo with 3D authentication properties. We overcome a collection of practical challenges, e.g., depth estimation errors from 2D photos, depth images forgery based on structured light, the alignment of the RGB image and depth images for a face, and implemented DepthFake in laboratory setups. We validated DepthFake on 3 commercial face authentication systems (i.e., Tencent Cloud, Baidu Cloud, and 3DiVi) and one commercial access control device. The results over 50 users demonstrate that DepthFake achieves an overall Depth attack success rate of 79.4% and RGB-D attack success rate of 59.4% in the real world.
{"title":"DepthFake: Spoofing 3D Face Authentication with a 2D Photo","authors":"Zhihao Wu, Yushi Cheng, Jiahui Yang, Xiaoyu Ji, Wenyuan Xu","doi":"10.1109/SP46215.2023.10179429","DOIUrl":"https://doi.org/10.1109/SP46215.2023.10179429","url":null,"abstract":"Face authentication has been widely used in access control, and the latest 3D face authentication systems employ 3D liveness detection techniques to cope with the photo replay attacks, whereby an attacker uses a 2D photo to bypass the authentication. In this paper, we analyze the security of 3D liveness detection systems that utilize structured light depth cameras and discover a new attack surface against 3D face authentication systems. We propose DepthFake attacks that can spoof a 3D face authentication using only one single 2D photo. To achieve this goal, DepthFake first estimates the 3D depth information of a target victim’s face from his 2D photo. Then, DepthFake projects the carefully-crafted scatter patterns embedded with the face depth information, in order to empower the 2D photo with 3D authentication properties. We overcome a collection of practical challenges, e.g., depth estimation errors from 2D photos, depth images forgery based on structured light, the alignment of the RGB image and depth images for a face, and implemented DepthFake in laboratory setups. We validated DepthFake on 3 commercial face authentication systems (i.e., Tencent Cloud, Baidu Cloud, and 3DiVi) and one commercial access control device. The results over 50 users demonstrate that DepthFake achieves an overall Depth attack success rate of 79.4% and RGB-D attack success rate of 59.4% in the real world.","PeriodicalId":439989,"journal":{"name":"2023 IEEE Symposium on Security and Privacy (SP)","volume":"7 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120807902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-01DOI: 10.1109/SP46215.2023.10179476
Phoebe Moh, P. Datta, N. Warford, Adam Bates, Nathan Malkin, Michelle L. Mazurek
Exploration of Internet of Things (IoT) security often focuses on threats posed by external and technically-skilled attackers. While it is important to understand these most extreme cases, it is equally important to understand the most likely risks of harm posed by smart device ownership. In this paper, we explore how smart devices are misused — used without permission in a manner that causes harm — by device owners’ everyday associates such as friends, family, and romantic partners. In a preliminary characterization survey (n = 100), we broadly capture the kinds of unauthorized use and misuse incidents participants have experienced or engaged in. Then, in a prevalence survey (n = 483), we assess the prevalence of these incidents in a demographically-representative population. Our findings show that unauthorized use of smart devices is widespread (experienced by 43% of participants), and that misuse is also common (experienced by at least 19% of participants). However, highly individual factors determine whether these unauthorized use events constitute misuse. Through a focus on everyday abuses, this work sheds light on the most prevalent security and privacy threats faced by smart-home owners today.
{"title":"Characterizing Everyday Misuse of Smart Home Devices","authors":"Phoebe Moh, P. Datta, N. Warford, Adam Bates, Nathan Malkin, Michelle L. Mazurek","doi":"10.1109/SP46215.2023.10179476","DOIUrl":"https://doi.org/10.1109/SP46215.2023.10179476","url":null,"abstract":"Exploration of Internet of Things (IoT) security often focuses on threats posed by external and technically-skilled attackers. While it is important to understand these most extreme cases, it is equally important to understand the most likely risks of harm posed by smart device ownership. In this paper, we explore how smart devices are misused — used without permission in a manner that causes harm — by device owners’ everyday associates such as friends, family, and romantic partners. In a preliminary characterization survey (n = 100), we broadly capture the kinds of unauthorized use and misuse incidents participants have experienced or engaged in. Then, in a prevalence survey (n = 483), we assess the prevalence of these incidents in a demographically-representative population. Our findings show that unauthorized use of smart devices is widespread (experienced by 43% of participants), and that misuse is also common (experienced by at least 19% of participants). However, highly individual factors determine whether these unauthorized use events constitute misuse. Through a focus on everyday abuses, this work sheds light on the most prevalent security and privacy threats faced by smart-home owners today.","PeriodicalId":439989,"journal":{"name":"2023 IEEE Symposium on Security and Privacy (SP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127771379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-01DOI: 10.1109/SP46215.2023.10179419
Kevin Choi, A. Manoj, Joseph Bonneau
Motivated and inspired by the emergence of blockchains, many new protocols have recently been proposed for generating publicly verifiable randomness in a distributed yet secure fashion. These protocols work under different setups and assumptions, use various cryptographic tools, and entail unique trade-offs and characteristics. In this paper, we systematize the design of distributed randomness beacons (DRBs) as well as the cryptographic building blocks they rely on. We evaluate protocols on two key security properties, unbiasability and unpredictability, and discuss common attack vectors for predicting or biasing the beacon output and the countermeasures employed by protocols. We also compare protocols by communication and computational efficiency. Finally, we provide insights on the applicability of different protocols in various deployment scenarios and highlight possible directions for further research.
{"title":"SoK: Distributed Randomness Beacons","authors":"Kevin Choi, A. Manoj, Joseph Bonneau","doi":"10.1109/SP46215.2023.10179419","DOIUrl":"https://doi.org/10.1109/SP46215.2023.10179419","url":null,"abstract":"Motivated and inspired by the emergence of blockchains, many new protocols have recently been proposed for generating publicly verifiable randomness in a distributed yet secure fashion. These protocols work under different setups and assumptions, use various cryptographic tools, and entail unique trade-offs and characteristics. In this paper, we systematize the design of distributed randomness beacons (DRBs) as well as the cryptographic building blocks they rely on. We evaluate protocols on two key security properties, unbiasability and unpredictability, and discuss common attack vectors for predicting or biasing the beacon output and the countermeasures employed by protocols. We also compare protocols by communication and computational efficiency. Finally, we provide insights on the applicability of different protocols in various deployment scenarios and highlight possible directions for further research.","PeriodicalId":439989,"journal":{"name":"2023 IEEE Symposium on Security and Privacy (SP)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133792929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-01DOI: 10.1109/SP46215.2023.10179327
Michele Marazzi, Flavien Solt, Patrick Jattke, Kubo Takashi, Kaveh Razavi
Mitigating Rowhammer requires performing additional refresh operations to recharge DRAM rows before bits start to flip. These refreshes are scarce and can only happen periodically, impeding the design of effective mitigations as newer DRAM substrates become more vulnerable to Rowhammer, and more "victim" rows are affected by a single "aggressor" row.We introduce REGA, the first in-DRAM mechanism that can generate extra refresh operations each time a row is activated. Since row activations are the sole cause of Rowhammer, these extra refreshes become available as soon as the DRAM device faces Rowhammer-inducing activations. Refresh operations are traditionally performed using sense amplifiers. Sense amplifiers, however, are also in charge of handling the read and write operations. Consequently, the sense amplifiers cannot be used for refreshing rows during data transfers. To enable refresh operations in parallel to data transfers, REGA uses additional low-overhead buffering sense amplifiers for the sole purpose of data transfers. REGA can then use the original sense amplifiers for parallel refresh operations of other rows during row activations.The refreshes generated by REGA enable the design of simple and scalable in-DRAM mitigations with strong security guarantees. As an example, we build REGAM, the first deterministic in-DRAM mitigation that scales to small Rowhammer thresholds while remaining agnostic to the number of victims per aggressor. REGAM has a constant 2.1% area overhead, and can protect DDR5 devices with Rowhammer thresholds as small as 261, 517, and 1029 with 23.9%, 11.5%, and 4.7% more power, and 3.7%, 0.8% and 0% performance overhead.
{"title":"REGA: Scalable Rowhammer Mitigation with Refresh-Generating Activations","authors":"Michele Marazzi, Flavien Solt, Patrick Jattke, Kubo Takashi, Kaveh Razavi","doi":"10.1109/SP46215.2023.10179327","DOIUrl":"https://doi.org/10.1109/SP46215.2023.10179327","url":null,"abstract":"Mitigating Rowhammer requires performing additional refresh operations to recharge DRAM rows before bits start to flip. These refreshes are scarce and can only happen periodically, impeding the design of effective mitigations as newer DRAM substrates become more vulnerable to Rowhammer, and more \"victim\" rows are affected by a single \"aggressor\" row.We introduce REGA, the first in-DRAM mechanism that can generate extra refresh operations each time a row is activated. Since row activations are the sole cause of Rowhammer, these extra refreshes become available as soon as the DRAM device faces Rowhammer-inducing activations. Refresh operations are traditionally performed using sense amplifiers. Sense amplifiers, however, are also in charge of handling the read and write operations. Consequently, the sense amplifiers cannot be used for refreshing rows during data transfers. To enable refresh operations in parallel to data transfers, REGA uses additional low-overhead buffering sense amplifiers for the sole purpose of data transfers. REGA can then use the original sense amplifiers for parallel refresh operations of other rows during row activations.The refreshes generated by REGA enable the design of simple and scalable in-DRAM mitigations with strong security guarantees. As an example, we build REGAM, the first deterministic in-DRAM mitigation that scales to small Rowhammer thresholds while remaining agnostic to the number of victims per aggressor. REGAM has a constant 2.1% area overhead, and can protect DDR5 devices with Rowhammer thresholds as small as 261, 517, and 1029 with 23.9%, 11.5%, and 4.7% more power, and 3.7%, 0.8% and 0% performance overhead.","PeriodicalId":439989,"journal":{"name":"2023 IEEE Symposium on Security and Privacy (SP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131210493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-01DOI: 10.1109/SP46215.2023.10179412
Kangjie Lu
Today's software programs are bloating and have become extremely complex. As there is typically no internal isolation among modules in a program, a vulnerability can be exploited to corrupt the memory and take control of the whole program. Program modularization is thus a promising security mechanism that splits a complex program into smaller modules, so that memory-access instructions can be constrained from corrupting irrelevant modules. A general approach to realizing program modularization is dependence analysis which determines if an instruction is independent of specific code or data; and if so, it can be modularized. Unfortunately, dependence analysis in complex programs is generally considered infeasible, due to problems in data-flow analysis, such as unknown indirect-call targets, pointer aliasing, and path explosion. As a result, we have not seen practical automated program modularization built on dependence analysis.This paper presents a breakthrough—Type-based dependence analysis for Program Modularization (TyPM). Its goal is to determine which modules in a program can never pass a type of object (including references) to a memory-access instruction; therefore, objects of this type that are created by these modules can never be valid targets of the instruction. The idea is to employ a type-based analysis to first determine which types of data flows can take place between two modules, and then transitively resolve all dependent modules of a memory-access instruction, with respect to the specific type. Such an approach avoids the data-flow analysis and can be practical. We develop two important security applications based on TyPM: refining indirect-call targets and protecting critical data structures. We extensively evaluate TyPM with various system software, including an OS kernel, a hypervisor, UEFI firmware, and a browser. Results show that on average TyPM additionally refines indirect-call targets produced by the state of the art by 31%-91%. TyPM can also remove 99.9% of modules for memory-write instructions to prevent them from corrupting critical data structures in the Linux kernel.
{"title":"Practical Program Modularization with Type-Based Dependence Analysis","authors":"Kangjie Lu","doi":"10.1109/SP46215.2023.10179412","DOIUrl":"https://doi.org/10.1109/SP46215.2023.10179412","url":null,"abstract":"Today's software programs are bloating and have become extremely complex. As there is typically no internal isolation among modules in a program, a vulnerability can be exploited to corrupt the memory and take control of the whole program. Program modularization is thus a promising security mechanism that splits a complex program into smaller modules, so that memory-access instructions can be constrained from corrupting irrelevant modules. A general approach to realizing program modularization is dependence analysis which determines if an instruction is independent of specific code or data; and if so, it can be modularized. Unfortunately, dependence analysis in complex programs is generally considered infeasible, due to problems in data-flow analysis, such as unknown indirect-call targets, pointer aliasing, and path explosion. As a result, we have not seen practical automated program modularization built on dependence analysis.This paper presents a breakthrough—Type-based dependence analysis for Program Modularization (TyPM). Its goal is to determine which modules in a program can never pass a type of object (including references) to a memory-access instruction; therefore, objects of this type that are created by these modules can never be valid targets of the instruction. The idea is to employ a type-based analysis to first determine which types of data flows can take place between two modules, and then transitively resolve all dependent modules of a memory-access instruction, with respect to the specific type. Such an approach avoids the data-flow analysis and can be practical. We develop two important security applications based on TyPM: refining indirect-call targets and protecting critical data structures. We extensively evaluate TyPM with various system software, including an OS kernel, a hypervisor, UEFI firmware, and a browser. Results show that on average TyPM additionally refines indirect-call targets produced by the state of the art by 31%-91%. TyPM can also remove 99.9% of modules for memory-write instructions to prevent them from corrupting critical data structures in the Linux kernel.","PeriodicalId":439989,"journal":{"name":"2023 IEEE Symposium on Security and Privacy (SP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122863974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-01DOI: 10.1109/SP46215.2023.10179307
Yapeng Ye, Zhuo Zhang, Qingkai Shi, Yousra Aafer, X. Zhang
ARM binary analysis has a wide range of applications in ARM system security. A fundamental challenge is ARM disassembly. ARM, particularly AArch32, has a number of unique features making disassembly distinct from x86 disassembly, such as the mixing of ARM and Thumb instruction modes, implicit mode switching within an application, and more prevalent use of inlined data. Existing techniques cannot achieve high accuracy when binaries become complex and have undergone obfuscation. We propose a novel ARM binary disassembly technique that is particularly designed to address challenges in legacy code for 32-bit ARM binaries. It features a lightweight superset instruction interpretation method to derive rich semantic information and a graph-theory based method that aggregates such information to produce final results. Our comparative evaluation with a number of state-of-the-art disassemblers, including Ghidra, IDA, P-Disasm, XDA, D-Disasm, and Spedi, on thousands of binaries generated from SPEC2000 and SPEC2006 with various settings, and real-world applications collected online show that our technique D-ARM substantially outperforms the baselines.
{"title":"D-ARM: Disassembling ARM Binaries by Lightweight Superset Instruction Interpretation and Graph Modeling","authors":"Yapeng Ye, Zhuo Zhang, Qingkai Shi, Yousra Aafer, X. Zhang","doi":"10.1109/SP46215.2023.10179307","DOIUrl":"https://doi.org/10.1109/SP46215.2023.10179307","url":null,"abstract":"ARM binary analysis has a wide range of applications in ARM system security. A fundamental challenge is ARM disassembly. ARM, particularly AArch32, has a number of unique features making disassembly distinct from x86 disassembly, such as the mixing of ARM and Thumb instruction modes, implicit mode switching within an application, and more prevalent use of inlined data. Existing techniques cannot achieve high accuracy when binaries become complex and have undergone obfuscation. We propose a novel ARM binary disassembly technique that is particularly designed to address challenges in legacy code for 32-bit ARM binaries. It features a lightweight superset instruction interpretation method to derive rich semantic information and a graph-theory based method that aggregates such information to produce final results. Our comparative evaluation with a number of state-of-the-art disassemblers, including Ghidra, IDA, P-Disasm, XDA, D-Disasm, and Spedi, on thousands of binaries generated from SPEC2000 and SPEC2006 with various settings, and real-world applications collected online show that our technique D-ARM substantially outperforms the baselines.","PeriodicalId":439989,"journal":{"name":"2023 IEEE Symposium on Security and Privacy (SP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127890300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-01DOI: 10.1109/SP46215.2023.10179438
Ru Ji, Meng Xu
A formally verified program is only as correct as its specifications (SPEC). But how to assure that the SPEC is complete and free of loopholes? This paper presents Fast, short for Fuzzing-Assisted Specification Testing, as a potential answer. The key insight is to exploit and synergize the "redundancy" and "diversity" in formally verified programs for cross-checking. Specifically, within the same codebase, SPEC, implementation (CODE), and test suites are all derived from the same set of business requirements. Therefore, if some intention is captured in CODE and test case but not in SPEC, this is a strong indication that there is a blind spot in SPEC.Fast examines the SPEC for incompleteness issues in an automated way: it first locates SPEC gaps via mutation testing, i.e., by checking whether a CODE variant conforms to the original SPEC. If so, Fast further leverages the test suites to infer whether the gap is introduced by intention or by mistake. Depending on the codebase size, Fast may choose to generate CODE variants in either an enumerative or evolutionary way. Fast is applied to two open-source codebases that feature formal verification and helps to confirm 13 and 21 blind spots in their SPEC respectively. This highlights the prevalence of SPEC incompleteness in real-world applications.
{"title":"Finding Specification Blind Spots via Fuzz Testing","authors":"Ru Ji, Meng Xu","doi":"10.1109/SP46215.2023.10179438","DOIUrl":"https://doi.org/10.1109/SP46215.2023.10179438","url":null,"abstract":"A formally verified program is only as correct as its specifications (SPEC). But how to assure that the SPEC is complete and free of loopholes? This paper presents Fast, short for Fuzzing-Assisted Specification Testing, as a potential answer. The key insight is to exploit and synergize the \"redundancy\" and \"diversity\" in formally verified programs for cross-checking. Specifically, within the same codebase, SPEC, implementation (CODE), and test suites are all derived from the same set of business requirements. Therefore, if some intention is captured in CODE and test case but not in SPEC, this is a strong indication that there is a blind spot in SPEC.Fast examines the SPEC for incompleteness issues in an automated way: it first locates SPEC gaps via mutation testing, i.e., by checking whether a CODE variant conforms to the original SPEC. If so, Fast further leverages the test suites to infer whether the gap is introduced by intention or by mistake. Depending on the codebase size, Fast may choose to generate CODE variants in either an enumerative or evolutionary way. Fast is applied to two open-source codebases that feature formal verification and helps to confirm 13 and 21 blind spots in their SPEC respectively. This highlights the prevalence of SPEC incompleteness in real-world applications.","PeriodicalId":439989,"journal":{"name":"2023 IEEE Symposium on Security and Privacy (SP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128821953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-01DOI: 10.1109/SP46215.2023.10179409
Hanshen Xiao, Zihang Xiang, Di Wang, S. Devadas
We study the bias introduced in Differentially-Private Stochastic Gradient Descent (DP-SGD) with clipped or normalized per-sample gradient. As one of the most popular but artificial operations to ensure bounded sensitivity, gradient clipping enables composite privacy analysis of many iterative optimization methods without additional assumptions on either learning models or input data. Despite its wide applicability, gradient clipping also presents theoretical challenges in systematically instructing improvement of privacy or utility. In general, without an assumption on globally-bounded gradient, classic convergence analyses do not apply to clipped gradient descent. Further, given limited understanding of the utility loss, many existing improvements to DP-SGD are heuristic, especially in the applications of private deep learning.In this paper, we provide meaningful theoretical analysis validated by thorough empirical results of DP-SGD. We point out that the bias caused by gradient clipping is underestimated in previous works. For generic non-convex optimization via DP-SGD, we show one key factor contributing to the bias is the sampling noise of stochastic gradient to be clipped. Accordingly, we use the developed theory to build a series of improvements for sampling noise reduction from various perspectives. From an optimization angle, we study variance reduction techniques and propose inner-outer momentum. At the learning model (neural network) level, we propose several tricks to enhance network internal normalization and BatchClipping to carefully clip the gradient of a batch of samples. For data preprocessing, we provide theoretical justification of recently proposed improvements via data normalization and (self-)augmentation.Putting these systematic improvements together, private deep learning via DP-SGD can be significantly strengthened in many tasks. For example, in computer vision applications, with an (ϵ = 8, δ = 10−5) DP guarantee, we successfully train ResNet20 on CIFAR10 and SVHN with test accuracy 76.0% and 90.1%, respectively; for natural language processing, with (ϵ = 4, δ = 10−5), we successfully train a recurrent neural network on IMDb data with test accuracy 77.5%.
{"title":"A Theory to Instruct Differentially-Private Learning via Clipping Bias Reduction","authors":"Hanshen Xiao, Zihang Xiang, Di Wang, S. Devadas","doi":"10.1109/SP46215.2023.10179409","DOIUrl":"https://doi.org/10.1109/SP46215.2023.10179409","url":null,"abstract":"We study the bias introduced in Differentially-Private Stochastic Gradient Descent (DP-SGD) with clipped or normalized per-sample gradient. As one of the most popular but artificial operations to ensure bounded sensitivity, gradient clipping enables composite privacy analysis of many iterative optimization methods without additional assumptions on either learning models or input data. Despite its wide applicability, gradient clipping also presents theoretical challenges in systematically instructing improvement of privacy or utility. In general, without an assumption on globally-bounded gradient, classic convergence analyses do not apply to clipped gradient descent. Further, given limited understanding of the utility loss, many existing improvements to DP-SGD are heuristic, especially in the applications of private deep learning.In this paper, we provide meaningful theoretical analysis validated by thorough empirical results of DP-SGD. We point out that the bias caused by gradient clipping is underestimated in previous works. For generic non-convex optimization via DP-SGD, we show one key factor contributing to the bias is the sampling noise of stochastic gradient to be clipped. Accordingly, we use the developed theory to build a series of improvements for sampling noise reduction from various perspectives. From an optimization angle, we study variance reduction techniques and propose inner-outer momentum. At the learning model (neural network) level, we propose several tricks to enhance network internal normalization and BatchClipping to carefully clip the gradient of a batch of samples. For data preprocessing, we provide theoretical justification of recently proposed improvements via data normalization and (self-)augmentation.Putting these systematic improvements together, private deep learning via DP-SGD can be significantly strengthened in many tasks. For example, in computer vision applications, with an (ϵ = 8, δ = 10−5) DP guarantee, we successfully train ResNet20 on CIFAR10 and SVHN with test accuracy 76.0% and 90.1%, respectively; for natural language processing, with (ϵ = 4, δ = 10−5), we successfully train a recurrent neural network on IMDb data with test accuracy 77.5%.","PeriodicalId":439989,"journal":{"name":"2023 IEEE Symposium on Security and Privacy (SP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114501418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}