Distributed data stores have been rapidly evolving to serve the needs of large-scale applications such as online gaming and real-time targeting. In particular, distributed key-value stores have been widely adopted due to their superior performance. However, these systems do not guarantee to provide strong protection of data confidentiality, and as a result fall short of addressing serious privacy concerns raised from massive data breaches. In this paper, we introduce EncKV, an encrypted key-value store with secure rich query support. First, EncKV stores encrypted data records with multiple secondary attributes in the form of encrypted key-value pairs. Second, it leverages the latest practical primitives for searching over encrypted data, i.e., searchable symmetric encryption and order-revealing encryption, and provides encrypted indexes with guaranteed security to support exact-match and range-match queries via secondary attributes of data records. Third, it carefully integrates these indexes into a distributed index framework to facilitate secure query processing in parallel. To mitigate recent inference attacks on encrypted database systems, EncKV protects the order information during range queries, and presents an interactive batch query mechanism to further hide the associations across data values on different attributes. We implement an EncKV prototype on a Redis cluster, and conduct an extensive set of performance evaluations on the Amazon EC2 public cloud platform. Our results show that EncKV effectively preserves the efficiency and scalability of plaintext distributed key-value stores.
{"title":"EncKV: An Encrypted Key-value Store with Rich Queries","authors":"Xingliang Yuan, Yu Guo, Xinyu Wang, Cong Wang, Baochun Li, X. Jia","doi":"10.1145/3052973.3052977","DOIUrl":"https://doi.org/10.1145/3052973.3052977","url":null,"abstract":"Distributed data stores have been rapidly evolving to serve the needs of large-scale applications such as online gaming and real-time targeting. In particular, distributed key-value stores have been widely adopted due to their superior performance. However, these systems do not guarantee to provide strong protection of data confidentiality, and as a result fall short of addressing serious privacy concerns raised from massive data breaches. In this paper, we introduce EncKV, an encrypted key-value store with secure rich query support. First, EncKV stores encrypted data records with multiple secondary attributes in the form of encrypted key-value pairs. Second, it leverages the latest practical primitives for searching over encrypted data, i.e., searchable symmetric encryption and order-revealing encryption, and provides encrypted indexes with guaranteed security to support exact-match and range-match queries via secondary attributes of data records. Third, it carefully integrates these indexes into a distributed index framework to facilitate secure query processing in parallel. To mitigate recent inference attacks on encrypted database systems, EncKV protects the order information during range queries, and presents an interactive batch query mechanism to further hide the associations across data values on different attributes. We implement an EncKV prototype on a Redis cluster, and conduct an extensive set of performance evaluations on the Amazon EC2 public cloud platform. Our results show that EncKV effectively preserves the efficiency and scalability of plaintext distributed key-value stores.","PeriodicalId":20540,"journal":{"name":"Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security","volume":"442 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76329757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Harshal Tupsamudre, Vijayanand Banahatti, S. Lodha, Ketan Vyas
The graphical pattern unlock scheme which requires users to connect a minimum of 4 nodes on 3X3 grid is one of the most popular authentication mechanism on mobile devices. However prior research suggests that users' pattern choices are highly biased and hence vulnerable to guessing attacks. Moreover, 3X3 pattern choices are devoid of features such as longer stroke lengths, direction changes and intersections that are considered to be important in preventing shoulder-surfing attacks. We attribute these insecure practices to the geometry of the grid and its complicated drawing rules which prevent users from realising the full potential of graphical passwords. In this paper, we propose and explore an alternate circular layout referred to as Pass-O which unlike grid layout allows connection between any two nodes, thus simplifying the pattern drawing rules. Consequently, Pass-O produces a theoretical search space of 9,85,824, almost 2.5 times greater than 3X3 grid layout. We compare the security of 3X3 and Pass-O patterns theoretically as well as empirically. Theoretically, Pass-O patterns are uniform and have greater visual complexity due to large number of intersections. To perform empirical analysis, we conduct a large-scale web-based user study and collect more than 1,23,000 patterns from 21,053 users. After examining user-chosen 3X3 and Pass-O patterns across different metrics such as pattern length, stroke length, start point, end point, repetitions, number of direction changes and intersections, we find that Pass-O patterns are much more secure than 3X3 patterns.
{"title":"Pass-O: A Proposal to Improve the Security of Pattern Unlock Scheme","authors":"Harshal Tupsamudre, Vijayanand Banahatti, S. Lodha, Ketan Vyas","doi":"10.1145/3052973.3053041","DOIUrl":"https://doi.org/10.1145/3052973.3053041","url":null,"abstract":"The graphical pattern unlock scheme which requires users to connect a minimum of 4 nodes on 3X3 grid is one of the most popular authentication mechanism on mobile devices. However prior research suggests that users' pattern choices are highly biased and hence vulnerable to guessing attacks. Moreover, 3X3 pattern choices are devoid of features such as longer stroke lengths, direction changes and intersections that are considered to be important in preventing shoulder-surfing attacks. We attribute these insecure practices to the geometry of the grid and its complicated drawing rules which prevent users from realising the full potential of graphical passwords. In this paper, we propose and explore an alternate circular layout referred to as Pass-O which unlike grid layout allows connection between any two nodes, thus simplifying the pattern drawing rules. Consequently, Pass-O produces a theoretical search space of 9,85,824, almost 2.5 times greater than 3X3 grid layout. We compare the security of 3X3 and Pass-O patterns theoretically as well as empirically. Theoretically, Pass-O patterns are uniform and have greater visual complexity due to large number of intersections. To perform empirical analysis, we conduct a large-scale web-based user study and collect more than 1,23,000 patterns from 21,053 users. After examining user-chosen 3X3 and Pass-O patterns across different metrics such as pattern length, stroke length, start point, end point, repetitions, number of direction changes and intersections, we find that Pass-O patterns are much more secure than 3X3 patterns.","PeriodicalId":20540,"journal":{"name":"Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76343637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The COPA authenticated encryption mode was proved to have a birthday-bound security on integrity, and its instantiation AES-COPA (v1/2) was claimed or conjectured to have a full security on tag guessing. The Marble (v1.0/1.1/1.2) authenticated encryption algorithm was claimed to have a full security on authenticity. Both AES-COPA (v1) and Marble (v1.0) were submitted to the Competition for Authenticated Encryption: Security, Applicability, and Robustness (CAESAR) in 2014, and Marble was revised twice (v1.1/1.2) in the first round of CAESAR, and AES-COPA (v1) was tweaked (v2) for the second round of CAESAR. In this paper, we cryptanalyse the basic cases of COPA, AES-COPA and Marble, that process messages of a multiple of the block size long; we present collision-based almost universal forgery attacks on the basic cases of COPA, AES-COPA (v1/2) and Marble (v1.0/1.1/1.2), and show that the basic cases of COPA and AES-COPA have roughly at most a birthday-bound security on tag guessing and the basic case of Marble has roughly at most a birthday-bound security on authenticity. The attacks on COPA and AES-COPA do not violate their birthday-bound security proof on integrity, but the attack on AES-COPA violates its full security claim or conjecture on tag guessing. Therefore, the full security claim or conjecture on tag guessing of AES-COPA and the full security claim on authenticity of Marble are incorrectly far overestimated in the sense of a general understanding of full security of these security notions. Designers should pay attention to these attacks when designing authenticated encryption algorithms with similar structures in the future, and should be careful when claiming the security of an advanced form of a security notion without making a corresponding proof after proving the security of the security notion only under its most fundamental form.
{"title":"Almost Universal Forgery Attacks on the COPA and Marble Authenticated Encryption Algorithms","authors":"Jiqiang Lu","doi":"10.1145/3052973.3052981","DOIUrl":"https://doi.org/10.1145/3052973.3052981","url":null,"abstract":"The COPA authenticated encryption mode was proved to have a birthday-bound security on integrity, and its instantiation AES-COPA (v1/2) was claimed or conjectured to have a full security on tag guessing. The Marble (v1.0/1.1/1.2) authenticated encryption algorithm was claimed to have a full security on authenticity. Both AES-COPA (v1) and Marble (v1.0) were submitted to the Competition for Authenticated Encryption: Security, Applicability, and Robustness (CAESAR) in 2014, and Marble was revised twice (v1.1/1.2) in the first round of CAESAR, and AES-COPA (v1) was tweaked (v2) for the second round of CAESAR. In this paper, we cryptanalyse the basic cases of COPA, AES-COPA and Marble, that process messages of a multiple of the block size long; we present collision-based almost universal forgery attacks on the basic cases of COPA, AES-COPA (v1/2) and Marble (v1.0/1.1/1.2), and show that the basic cases of COPA and AES-COPA have roughly at most a birthday-bound security on tag guessing and the basic case of Marble has roughly at most a birthday-bound security on authenticity. The attacks on COPA and AES-COPA do not violate their birthday-bound security proof on integrity, but the attack on AES-COPA violates its full security claim or conjecture on tag guessing. Therefore, the full security claim or conjecture on tag guessing of AES-COPA and the full security claim on authenticity of Marble are incorrectly far overestimated in the sense of a general understanding of full security of these security notions. Designers should pay attention to these attacks when designing authenticated encryption algorithms with similar structures in the future, and should be careful when claiming the security of an advanced form of a security notion without making a corresponding proof after proving the security of the security notion only under its most fundamental form.","PeriodicalId":20540,"journal":{"name":"Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security","volume":"84 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73754244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Mobile Apps & Markets","authors":"W. Enck","doi":"10.1145/3248548","DOIUrl":"https://doi.org/10.1145/3248548","url":null,"abstract":"","PeriodicalId":20540,"journal":{"name":"Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security","volume":"63 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73829628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Storage Security","authors":"Long Lu","doi":"10.1145/3248556","DOIUrl":"https://doi.org/10.1145/3248556","url":null,"abstract":"","PeriodicalId":20540,"journal":{"name":"Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security","volume":"80 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76160995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Embedded Systems Security 1","authors":"Daphne Yao","doi":"10.1145/3248549","DOIUrl":"https://doi.org/10.1145/3248549","url":null,"abstract":"","PeriodicalId":20540,"journal":{"name":"Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79112659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samaneh Tajalizadehkhoob, C. Gañán, Arman Noroozian, M. V. Eeten
A variety of botnets are used in attacks on financial services. Banks and security firms invest a lot of effort in detecting and combating malware-assisted takeover of customer accounts. A critical resource of these botnets is their command-and-control (C&C) infrastructure. Attackers rent or compromise servers to operate their C&C infrastructure. Hosting providers routinely take down C&C servers, but the effectiveness of this mitigation strategy depends on understanding how attackers select the hosting providers to host their servers. Do they prefer, for example, providers who are slow or unwilling in taking down C&Cs? In this paper, we analyze 7 years of data on the C&C servers of botnets that have engaged in attacks on financial services. Our aim is to understand whether attackers prefer certain types of providers or whether their C&Cs are randomly distributed across the whole attack surface of the hosting industry. We extract a set of structural properties of providers to capture the attack surface. We model the distribution of C&Cs across providers and show that the mere size of the provider can explain around 71% of the variance in the number of C&Cs per provider, whereas the rule of law in the country only explains around 1%. We further observe that price, time in business, popularity and ratio of vulnerable websites of providers relate significantly with C&C counts. Finally, we find that the speed with which providers take down C&C domains has only a weak relation with C&C occurrence rates, adding only 1% explained variance. This suggests attackers have little to no preference for providers who allow long-lived C&C domains.
{"title":"The Role of Hosting Providers in Fighting Command and Control Infrastructure of Financial Malware","authors":"Samaneh Tajalizadehkhoob, C. Gañán, Arman Noroozian, M. V. Eeten","doi":"10.1145/3052973.3053023","DOIUrl":"https://doi.org/10.1145/3052973.3053023","url":null,"abstract":"A variety of botnets are used in attacks on financial services. Banks and security firms invest a lot of effort in detecting and combating malware-assisted takeover of customer accounts. A critical resource of these botnets is their command-and-control (C&C) infrastructure. Attackers rent or compromise servers to operate their C&C infrastructure. Hosting providers routinely take down C&C servers, but the effectiveness of this mitigation strategy depends on understanding how attackers select the hosting providers to host their servers. Do they prefer, for example, providers who are slow or unwilling in taking down C&Cs? In this paper, we analyze 7 years of data on the C&C servers of botnets that have engaged in attacks on financial services. Our aim is to understand whether attackers prefer certain types of providers or whether their C&Cs are randomly distributed across the whole attack surface of the hosting industry. We extract a set of structural properties of providers to capture the attack surface. We model the distribution of C&Cs across providers and show that the mere size of the provider can explain around 71% of the variance in the number of C&Cs per provider, whereas the rule of law in the country only explains around 1%. We further observe that price, time in business, popularity and ratio of vulnerable websites of providers relate significantly with C&C counts. Finally, we find that the speed with which providers take down C&C domains has only a weak relation with C&C occurrence rates, adding only 1% explained variance. This suggests attackers have little to no preference for providers who allow long-lived C&C domains.","PeriodicalId":20540,"journal":{"name":"Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84470166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Networks are vulnerable to disruptions caused by malicious forwarding devices. The situation is likely to worsen in Software Defined Networks (SDNs) with the incompatibility of existing solutions, use of programmable soft switches and the potential of bringing down an entire network through compromised forwarding devices. In this paper, we present WedgeTail, an Intrusion Prevention System (IPS) designed to secure the SDN data plane. WedgeTail regards forwarding devices as points within a geometric space and stores the path packets take when traversing the network as trajectories. To be efficient, it prioritizes forwarding devices before inspection using an unsupervised trajectory-based sampling mechanism. For each of the forwarding device, WedgeTail computes the expected and actual trajectories of packets and 'hunts' for any forwarding device not processing packets as expected. Compared to related work, WedgeTail is also capable of distinguishing between malicious actions such as packet drop and generation. Moreover, WedgeTail employs a radically different methodology that enables detecting threats autonomously. In fact, it has no reliance on pre-defined rules by an administrator and may be easily imported to protect SDN networks with different setups, forwarding devices, and controllers. We have evaluated WedgeTail in simulated environments, and it has been capable of detecting and responding to all implanted malicious forwarding devices within a reasonable time-frame. We report on the design, implementation, and evaluation of WedgeTail in this manuscript.
{"title":"WedgeTail: An Intrusion Prevention System for the Data Plane of Software Defined Networks","authors":"Arash Shaghaghi, M. Kâafar, Sanjay Jha","doi":"10.1145/3052973.3053039","DOIUrl":"https://doi.org/10.1145/3052973.3053039","url":null,"abstract":"Networks are vulnerable to disruptions caused by malicious forwarding devices. The situation is likely to worsen in Software Defined Networks (SDNs) with the incompatibility of existing solutions, use of programmable soft switches and the potential of bringing down an entire network through compromised forwarding devices. In this paper, we present WedgeTail, an Intrusion Prevention System (IPS) designed to secure the SDN data plane. WedgeTail regards forwarding devices as points within a geometric space and stores the path packets take when traversing the network as trajectories. To be efficient, it prioritizes forwarding devices before inspection using an unsupervised trajectory-based sampling mechanism. For each of the forwarding device, WedgeTail computes the expected and actual trajectories of packets and 'hunts' for any forwarding device not processing packets as expected. Compared to related work, WedgeTail is also capable of distinguishing between malicious actions such as packet drop and generation. Moreover, WedgeTail employs a radically different methodology that enables detecting threats autonomously. In fact, it has no reliance on pre-defined rules by an administrator and may be easily imported to protect SDN networks with different setups, forwarding devices, and controllers. We have evaluated WedgeTail in simulated environments, and it has been capable of detecting and responding to all implanted malicious forwarding devices within a reasonable time-frame. We report on the design, implementation, and evaluation of WedgeTail in this manuscript.","PeriodicalId":20540,"journal":{"name":"Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security","volume":"81 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83969174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruan de Clercq, Ronald De Keulenaer, Pieter Maene, B. Preneel, B. D. Sutter, I. Verbauwhede
An increasing number of applications implemented on a SoC (System-on-chip) require security features. This work addresses the issue of protecting the integrity of code and read-only data that is stored in memory. To this end, we propose a new architecture called SCM, which works as a standalone IP core in a SoC. To the best of our knowledge, there exists no architectural elements similar to SCM that offer the same strict security guarantees while, at the same time, not requiring any modifications to other IP cores in its SoC design. In addition, SCM has the flexibility to select the parts of the software to be protected, which eases the integration of our solution with existing software. The evaluation of SCM was done on the Zynq platform which features an ARM processor and an FPGA. The design was evaluated by executing a number of different benchmarks from memory protected by SCM, and we found that it introduces minimal overhead to the system.
{"title":"SCM: Secure Code Memory Architecture","authors":"Ruan de Clercq, Ronald De Keulenaer, Pieter Maene, B. Preneel, B. D. Sutter, I. Verbauwhede","doi":"10.1145/3052973.3053044","DOIUrl":"https://doi.org/10.1145/3052973.3053044","url":null,"abstract":"An increasing number of applications implemented on a SoC (System-on-chip) require security features. This work addresses the issue of protecting the integrity of code and read-only data that is stored in memory. To this end, we propose a new architecture called SCM, which works as a standalone IP core in a SoC. To the best of our knowledge, there exists no architectural elements similar to SCM that offer the same strict security guarantees while, at the same time, not requiring any modifications to other IP cores in its SoC design. In addition, SCM has the flexibility to select the parts of the software to be protected, which eases the integration of our solution with existing software. The evaluation of SCM was done on the Zynq platform which features an ARM processor and an FPGA. The design was evaluated by executing a number of different benchmarks from memory protected by SCM, and we found that it introduces minimal overhead to the system.","PeriodicalId":20540,"journal":{"name":"Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security","volume":"362 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89299692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Code reuse detection is a key technique in reverse engineering. However, existing source code similarity comparison techniques are not applicable to binary code. Moreover, compilers have made this problem even more difficult due to the fact that different assembly code and control flow structures can be generated by the compilers even when implementing the same functionality. To address this problem, we present a fuzzy matching approach to compare two functions. We first obtain an initial mapping between basic blocks by leveraging the concept of longest common subsequence on the basic block level and execution path level. We then extend the achieved mapping using neighborhood exploration. To make our approach applicable to large data sets, we designed an effective filtering process using Minhashing. Based on the proposed approach, we implemented a tool named BinSequence and conducted extensive experiments with it. Our results show that given a large assembly code repository with millions of functions, BinSequence is efficient and can attain high quality similarity ranking of assembly functions with an accuracy of above 90%. We also present several practical use cases including patch analysis, malware analysis and bug search.
{"title":"BinSequence: Fast, Accurate and Scalable Binary Code Reuse Detection","authors":"He Huang, A. Youssef, M. Debbabi","doi":"10.1145/3052973.3052974","DOIUrl":"https://doi.org/10.1145/3052973.3052974","url":null,"abstract":"Code reuse detection is a key technique in reverse engineering. However, existing source code similarity comparison techniques are not applicable to binary code. Moreover, compilers have made this problem even more difficult due to the fact that different assembly code and control flow structures can be generated by the compilers even when implementing the same functionality. To address this problem, we present a fuzzy matching approach to compare two functions. We first obtain an initial mapping between basic blocks by leveraging the concept of longest common subsequence on the basic block level and execution path level. We then extend the achieved mapping using neighborhood exploration. To make our approach applicable to large data sets, we designed an effective filtering process using Minhashing. Based on the proposed approach, we implemented a tool named BinSequence and conducted extensive experiments with it. Our results show that given a large assembly code repository with millions of functions, BinSequence is efficient and can attain high quality similarity ranking of assembly functions with an accuracy of above 90%. We also present several practical use cases including patch analysis, malware analysis and bug search.","PeriodicalId":20540,"journal":{"name":"Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87593819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}