{"title":"Who Changed You? Obfuscator Identification for Android","authors":"Yan Wang, A. Rountev","doi":"10.1109/MOBILESoft.2017.18","DOIUrl":null,"url":null,"abstract":"Android developers commonly use app obfuscation to secure their apps and intellectual property. Although obfuscation provides protection, it presents an obstacle for a number of legitimate program analyses such as detection of app cloning and repackaging, malware detection, identification of third-party libraries, provenance analysis for digital forensics, and reverse engineering for test generation and performance analysis. If the obfuscator used to create an app can be identified, and if some details of the obfuscation process can be inferred, subsequent analyses can exploit this knowledge. Thus, it is desirable to be able to automatically analyze a given app and determine (1) whether it was obfuscated, (2) which obfuscator was used, and (3) how the obfuscator was configured. We have developed novel techniques to identify the obfuscator of an Android app for several widely-used obfuscation tools and for a number of their configuration options. We define the obfuscator identification problem and propose a solution based on machine learning. To the best of our knowledge, this is the first work to formulate and solve this problem. We identify a feature vector that represents the characteristics of the obfuscated code. We then implement a tool that extracts this feature vector from Dalvik bytecode and uses it to identify the obfuscator provenance information. We evaluate the proposed approach on real-world Android apps obfuscated with different obfuscators, under several configurations. Our experiments indicate that the approach identifies the obfuscator with about 97% accuracy and recognizes the configuration with more than 90% accuracy.","PeriodicalId":281934,"journal":{"name":"2017 IEEE/ACM 4th International Conference on Mobile Software Engineering and Systems (MOBILESoft)","volume":"184 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"41","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE/ACM 4th International Conference on Mobile Software Engineering and Systems (MOBILESoft)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MOBILESoft.2017.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 41
Abstract
Android developers commonly use app obfuscation to secure their apps and intellectual property. Although obfuscation provides protection, it presents an obstacle for a number of legitimate program analyses such as detection of app cloning and repackaging, malware detection, identification of third-party libraries, provenance analysis for digital forensics, and reverse engineering for test generation and performance analysis. If the obfuscator used to create an app can be identified, and if some details of the obfuscation process can be inferred, subsequent analyses can exploit this knowledge. Thus, it is desirable to be able to automatically analyze a given app and determine (1) whether it was obfuscated, (2) which obfuscator was used, and (3) how the obfuscator was configured. We have developed novel techniques to identify the obfuscator of an Android app for several widely-used obfuscation tools and for a number of their configuration options. We define the obfuscator identification problem and propose a solution based on machine learning. To the best of our knowledge, this is the first work to formulate and solve this problem. We identify a feature vector that represents the characteristics of the obfuscated code. We then implement a tool that extracts this feature vector from Dalvik bytecode and uses it to identify the obfuscator provenance information. We evaluate the proposed approach on real-world Android apps obfuscated with different obfuscators, under several configurations. Our experiments indicate that the approach identifies the obfuscator with about 97% accuracy and recognizes the configuration with more than 90% accuracy.