Wesley Jin, Cory F. Cohen, Jeffrey Gennari, C. Hines, S. Chaki, A. Gurfinkel, Jeffrey Havrilla, P. Narasimhan
Object-oriented programming complicates the already difficult task of reverse engineering software, and is being used increasingly by malware authors. Unlike traditional procedural-style code, reverse engineers must understand the complex interactions between object-oriented methods and the shared data structures with which they operate on, a tedious manual process. In this paper, we present a static approach that uses symbolic execution and inter-procedural data flow analysis to discover object instances, data members, and methods of a common class. The key idea behind our work is to track the propagation and usage of a unique object instance reference, called a this pointer. Our goal is to help malware reverse engineers to understand how classes are laid out and to identify their methods. We have implemented our approach in a tool called ObJDIGGER, which produced encouraging results when validated on real-world malware samples.
{"title":"Recovering C++ Objects From Binaries Using Inter-Procedural Data-Flow Analysis","authors":"Wesley Jin, Cory F. Cohen, Jeffrey Gennari, C. Hines, S. Chaki, A. Gurfinkel, Jeffrey Havrilla, P. Narasimhan","doi":"10.1145/2556464.2556465","DOIUrl":"https://doi.org/10.1145/2556464.2556465","url":null,"abstract":"Object-oriented programming complicates the already difficult task of reverse engineering software, and is being used increasingly by malware authors. Unlike traditional procedural-style code, reverse engineers must understand the complex interactions between object-oriented methods and the shared data structures with which they operate on, a tedious manual process.\u0000 In this paper, we present a static approach that uses symbolic execution and inter-procedural data flow analysis to discover object instances, data members, and methods of a common class. The key idea behind our work is to track the propagation and usage of a unique object instance reference, called a this pointer. Our goal is to help malware reverse engineers to understand how classes are laid out and to identify their methods. We have implemented our approach in a tool called ObJDIGGER, which produced encouraging results when validated on real-world malware samples.","PeriodicalId":326045,"journal":{"name":"PPREW'14","volume":"47 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114037993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present an automated method for extracting familial signatures for Android malware, i.e., signatures that identify malware produced by piggybacking potentially different benign applications with the same (or similar) malicious code. The APK classes that constitute malware code in a repackaged application are separated from the benign code and the Android API calls used by the malicious modules are extracted to create a signature. A piggybacked malicious app can be detected by first decomposing it into loosely coupled modules and then matching the Android API calls called by each of the modules against the signatures of the known malware families. Since the signatures are based on Android API calls, they are related to the core malware behavior, and thus are more resilient to obfuscations. In triage, AV companies need to automatically classify large number of samples so as to optimize assignment of human analysts. They need a system that gives low false negatives even if it is at the cost of higher false positives. Keeping this goal in mind, we fine tuned our system and used standard 10 fold cross validation over a dataset of 1,052 malicious APKs and 48 benign APKs to verify our algorithm. Results show that we have 94% accuracy, 97% precision, and 93% recall when separating benign from malware. We successfully classified our entire malware dataset into 11 families with 98% accuracy, 87% precision, and 94% recall.
{"title":"DroidLegacy: Automated Familial Classification of Android Malware","authors":"Luke Deshotels, Vivek Notani, Arun Lakhotia","doi":"10.1145/2556464.2556467","DOIUrl":"https://doi.org/10.1145/2556464.2556467","url":null,"abstract":"We present an automated method for extracting familial signatures for Android malware, i.e., signatures that identify malware produced by piggybacking potentially different benign applications with the same (or similar) malicious code. The APK classes that constitute malware code in a repackaged application are separated from the benign code and the Android API calls used by the malicious modules are extracted to create a signature. A piggybacked malicious app can be detected by first decomposing it into loosely coupled modules and then matching the Android API calls called by each of the modules against the signatures of the known malware families. Since the signatures are based on Android API calls, they are related to the core malware behavior, and thus are more resilient to obfuscations.\u0000 In triage, AV companies need to automatically classify large number of samples so as to optimize assignment of human analysts. They need a system that gives low false negatives even if it is at the cost of higher false positives. Keeping this goal in mind, we fine tuned our system and used standard 10 fold cross validation over a dataset of 1,052 malicious APKs and 48 benign APKs to verify our algorithm. Results show that we have 94% accuracy, 97% precision, and 93% recall when separating benign from malware. We successfully classified our entire malware dataset into 11 families with 98% accuracy, 87% precision, and 94% recall.","PeriodicalId":326045,"journal":{"name":"PPREW'14","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130930602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Any inspection, analysis or reverse engineering of binaries requires a translation of the program text into an intermediate representation (IR) that conveys the semantics of the program. To this end, we propose a domain specific language called GDSL (Generic Decoder Specification Language) that facilitates the translation from byte streams to instructions and from there to other intermediate representations. We present the GDSL toolkit, containing a compiler from GDSL to C, instruction decoders (currently for Intel x86 and Atmel AVR), translations to semantics, and optimizations of the semantics. Other processors, semantics and optimizations can be added, thereby providing a common platform for building frontends for the analysis of binaries. The emitted C code is human-readable and outperforms hand-written code such as the XED decoder shipped with the Intel Pin toolkit.
{"title":"The GDSL toolkit: Generating Frontends for the Analysis of Machine Code","authors":"A. Simon, J. Kranz","doi":"10.1145/2556464.2559596","DOIUrl":"https://doi.org/10.1145/2556464.2559596","url":null,"abstract":"Any inspection, analysis or reverse engineering of binaries requires a translation of the program text into an intermediate representation (IR) that conveys the semantics of the program. To this end, we propose a domain specific language called GDSL (Generic Decoder Specification Language) that facilitates the translation from byte streams to instructions and from there to other intermediate representations. We present the GDSL toolkit, containing a compiler from GDSL to C, instruction decoders (currently for Intel x86 and Atmel AVR), translations to semantics, and optimizations of the semantics. Other processors, semantics and optimizations can be added, thereby providing a common platform for building frontends for the analysis of binaries. The emitted C code is human-readable and outperforms hand-written code such as the XED decoder shipped with the Intel Pin toolkit.","PeriodicalId":326045,"journal":{"name":"PPREW'14","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114915389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Metamorphic malware continuously modify their code, while preserving their functionality, in order to foil misuse detection. The key for defeating metamorphism relies in a semantic characterization of the embedding of the malware into the target program. Indeed, a behavioral model of program infection that does not relay on syntactic program features should be able to defeat metamorphism. Moreover, a general model of infection should be able to express dependences and interactions between the malicious code and the target program. ANI is a general theory for the analysis of dependences of data in a program. We propose an high order theory for ANI, later called HOANI, that allows to study program dependencies. Our idea is then to formalize and study the malware detection problem in terms of HOANI.
{"title":"Analyzing program dependencies for malware detection","authors":"M. Preda, Isabella Mastroeni, R. Giacobazzi","doi":"10.1145/2556464.2556470","DOIUrl":"https://doi.org/10.1145/2556464.2556470","url":null,"abstract":"Metamorphic malware continuously modify their code, while preserving their functionality, in order to foil misuse detection. The key for defeating metamorphism relies in a semantic characterization of the embedding of the malware into the target program. Indeed, a behavioral model of program infection that does not relay on syntactic program features should be able to defeat metamorphism. Moreover, a general model of infection should be able to express dependences and interactions between the malicious code and the target program. ANI is a general theory for the analysis of dependences of data in a program. We propose an high order theory for ANI, later called HOANI, that allows to study program dependencies. Our idea is then to formalize and study the malware detection problem in terms of HOANI.","PeriodicalId":326045,"journal":{"name":"PPREW'14","volume":"98 43","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131879137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software programs are prone to reverse-engineering. Protection usually consists either in obfuscation or Randomized Instruction Set Emulation (RISE). In this article, we explore a mixed software/hardware RISE suitable for embedded systems. This solution is very easy to implement on any open CPU core (LEON, openRISC, LatticeMicro32, etc.), as it implies only localized changes at the latest stage of the code execution hardware, which makes Dallas and DMA attacks unsuccessful. Similarly, alternations in the software development flow are minor and straightforward. All in one, our study shows that an easy protection can be attained at virtually no overhead cost if both the hardware and the software are customized.
{"title":"Hardware-enforced Protection against Software Reverse-Engineering based on an Instruction Set Encoding","authors":"J. Danger, S. Guilley, Florian Praden","doi":"10.1145/2556464.2556469","DOIUrl":"https://doi.org/10.1145/2556464.2556469","url":null,"abstract":"Software programs are prone to reverse-engineering. Protection usually consists either in obfuscation or Randomized Instruction Set Emulation (RISE). In this article, we explore a mixed software/hardware RISE suitable for embedded systems. This solution is very easy to implement on any open CPU core (LEON, openRISC, LatticeMicro32, etc.), as it implies only localized changes at the latest stage of the code execution hardware, which makes Dallas and DMA attacks unsuccessful. Similarly, alternations in the software development flow are minor and straightforward. All in one, our study shows that an easy protection can be attained at virtually no overhead cost if both the hardware and the software are customized.","PeriodicalId":326045,"journal":{"name":"PPREW'14","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134398933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huaijun Wang, Dingyi Fang, Guanghui Li, Na An, Xiaojiang Chen, Y. Gu
The VM (Virtual effective solution to protect software, making it extremely a Machine)-based software protection technique provides difficulty to analyze and crack. In this paper, we improve it from two aspects. Firstly, the time diversity is to fight against cumulative attack by making software executing along variant paths in different running time. Secondly, transform instructions in an execution path with reducing performance penalty through controlling deformation strategy. At last, we design and develop a VM-based protection with time diversity system, named TDVMP, and carry out some experiments with it. The results show that the improvements are effective.
基于虚拟机(Virtual - effective solution to protect software, Virtual - effective solution to protect software,使之成为一台机器)的软件保护技术为分析和破解提供了难度。本文从两个方面对其进行改进。首先,时间分集是通过使软件在不同的运行时间内沿不同的路径运行来对抗累积攻击。其次,通过控制变形策略,实现指令在执行路径上的转换,降低性能损失。最后,设计并开发了一种基于虚拟机的带时分集保护系统TDVMP,并进行了实验。结果表明,改进是有效的。
{"title":"TDVMP: Improved Virtual Machine-Based Software Protection with Time Diversity","authors":"Huaijun Wang, Dingyi Fang, Guanghui Li, Na An, Xiaojiang Chen, Y. Gu","doi":"10.1145/2556464.2556468","DOIUrl":"https://doi.org/10.1145/2556464.2556468","url":null,"abstract":"The VM (Virtual effective solution to protect software, making it extremely a Machine)-based software protection technique provides difficulty to analyze and crack. In this paper, we improve it from two aspects. Firstly, the time diversity is to fight against cumulative attack by making software executing along variant paths in different running time. Secondly, transform instructions in an execution path with reducing performance penalty through controlling deformation strategy. At last, we design and develop a VM-based protection with time diversity system, named TDVMP, and carry out some experiments with it. The results show that the improvements are effective.","PeriodicalId":326045,"journal":{"name":"PPREW'14","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124064251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}