Although machine learning-based anti-phishing detectors have provided promising results in phishing website detection, they remain vulnerable to evasion attacks. The Machine Learning Security Evasion Competition 2022 (MLSEC 2022) provides researchers and practitioners with the opportunity to deploy evasion attacks against anti-phishing machine learning models in real-world settings. In this field note, we share our experience participating in MLSEC 2022. We manipulated the source code of ten phishing HTML pages provided by the competition using obfuscation techniques to evade anti-phishing models. Our evasion attacks employing a benign overlap strategy achieved third place in the competition with 46 out of a potential 80 points. The results of our MLSEC 2022 performance can provide valuable insights for research seeking to robustify machine learning-based anti-phishing detectors.
{"title":"Evading Anti-Phishing Models: A Field Note Documenting an Experience in the Machine Learning Security Evasion Competition 2022","authors":"Yang Gao, Benjamin Ampel, S. Samtani","doi":"10.1145/3603507","DOIUrl":"https://doi.org/10.1145/3603507","url":null,"abstract":"Although machine learning-based anti-phishing detectors have provided promising results in phishing website detection, they remain vulnerable to evasion attacks. The Machine Learning Security Evasion Competition 2022 (MLSEC 2022) provides researchers and practitioners with the opportunity to deploy evasion attacks against anti-phishing machine learning models in real-world settings. In this field note, we share our experience participating in MLSEC 2022. We manipulated the source code of ten phishing HTML pages provided by the competition using obfuscation techniques to evade anti-phishing models. Our evasion attacks employing a benign overlap strategy achieved third place in the competition with 46 out of a potential 80 points. The results of our MLSEC 2022 performance can provide valuable insights for research seeking to robustify machine learning-based anti-phishing detectors.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129724302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scarlett Taviss, Steven H. H. Ding, Mohammad Zulkernine, P. Charland, Sudipta Acharya
Reverse engineering is the process of understanding the inner working of a software system without having the source code. It is critical for firmware security validation, software vulnerability research, and malware analysis. However, it often requires a significant amount of manual effort. Recently, data-driven solutions were proposed to reduce manual effort by identifying the code clones on the assembly or the source level. However, security analysts still have to understand the matched assembly or source code to develop an understanding of the functionality, and it is assumed that such a matched candidate always exists. This research bridges the gap by introducing the problem of assembly code summarization. Given the assembly code as input, we propose a machine-learning-based system that can produce human-readable summarizations of the functionalities in the context of code vulnerability analysis. We generate the first assembly code to function summary dataset and propose to leverage the encoder-decoder architecture. With the attention mechanism, it is possible to understand what aspects of the assembly code had the largest impact on generating the summary. Our experiment shows that the proposed solution achieves high accuracy and the Bilingual Evaluation Understudy (BLEU) score. Finally, we have performed case studies on real-life CVE vulnerability cases to better understand the proposed method’s performance and practical implications.
{"title":"Asm2Seq: Explainable Assembly Code Functional Summary Generation for Reverse Engineering and Vulnerability Analysis","authors":"Scarlett Taviss, Steven H. H. Ding, Mohammad Zulkernine, P. Charland, Sudipta Acharya","doi":"10.1145/3592623","DOIUrl":"https://doi.org/10.1145/3592623","url":null,"abstract":"Reverse engineering is the process of understanding the inner working of a software system without having the source code. It is critical for firmware security validation, software vulnerability research, and malware analysis. However, it often requires a significant amount of manual effort. Recently, data-driven solutions were proposed to reduce manual effort by identifying the code clones on the assembly or the source level. However, security analysts still have to understand the matched assembly or source code to develop an understanding of the functionality, and it is assumed that such a matched candidate always exists. This research bridges the gap by introducing the problem of assembly code summarization. Given the assembly code as input, we propose a machine-learning-based system that can produce human-readable summarizations of the functionalities in the context of code vulnerability analysis. We generate the first assembly code to function summary dataset and propose to leverage the encoder-decoder architecture. With the attention mechanism, it is possible to understand what aspects of the assembly code had the largest impact on generating the summary. Our experiment shows that the proposed solution achieves high accuracy and the Bilingual Evaluation Understudy (BLEU) score. Finally, we have performed case studies on real-life CVE vulnerability cases to better understand the proposed method’s performance and practical implications.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130601200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Once novel malware is detected, threat reports are written by security companies that discover it. The reports often vary in the terminology describing the behavior of the malware making comparisons of reports of the same malware from different companies difficult. To aid in the automated discovery of novel malware, it was recently proposed that novel malware could be detected by identifying behaviors. This assumes that a core set of behaviors are present in most, if not all, malware variants. However, there is a lack of malware datasets that are labeled with behaviors. Motivated by a need to label malware with a common set of behaviors, this work examines automating the process of labeling malware with behaviors identified in malware threat reports despite the variability of terminology. To do so, we examine several techniques from the natural language processing (NLP) domain. We find that most state-of-the-art word embedding NLP methods require large amounts of data and are trained on generic corpora of text data—missing the nuances related to information security. To address this, we use simple feature selection techniques. We find that simple feature selection techniques generally outperform word embedding methods and achieve an increase of 6% in the F.5-score over prior work when used to predict MITRE ATT&CK tactics in threat reports. Our work indicates that feature selection, which has commonly been overlooked by sophisticated methods in NLP tasks, is beneficial for information security related tasks, where more sophisticated NLP methodologies are not able to pick out relevant information security terms.
{"title":"Improving Automated Labeling for ATT&CK Tactics in Malware Threat Reports","authors":"Eva Domschot, Ramyaa Ramyaa, Michael R. Smith","doi":"10.1145/3594553","DOIUrl":"https://doi.org/10.1145/3594553","url":null,"abstract":"Once novel malware is detected, threat reports are written by security companies that discover it. The reports often vary in the terminology describing the behavior of the malware making comparisons of reports of the same malware from different companies difficult. To aid in the automated discovery of novel malware, it was recently proposed that novel malware could be detected by identifying behaviors. This assumes that a core set of behaviors are present in most, if not all, malware variants. However, there is a lack of malware datasets that are labeled with behaviors. Motivated by a need to label malware with a common set of behaviors, this work examines automating the process of labeling malware with behaviors identified in malware threat reports despite the variability of terminology. To do so, we examine several techniques from the natural language processing (NLP) domain. We find that most state-of-the-art word embedding NLP methods require large amounts of data and are trained on generic corpora of text data—missing the nuances related to information security. To address this, we use simple feature selection techniques. We find that simple feature selection techniques generally outperform word embedding methods and achieve an increase of 6% in the F.5-score over prior work when used to predict MITRE ATT&CK tactics in threat reports. Our work indicates that feature selection, which has commonly been overlooked by sophisticated methods in NLP tasks, is beneficial for information security related tasks, where more sophisticated NLP methodologies are not able to pick out relevant information security terms.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116601871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oleg Boyarchuk, Sebastiano Mariani, Stefano Ortolani, G. Vigna
Throughout its eight-year history, Emotet has caused substantial damage. This threat reappeared at the beginning of 2022 following a take-down by law enforcement in November 2021. Emotet is arguably one of the most notorious advanced persistent threats, causing substantial damage during its earlier phases and continuing to pose a danger to organizations everywhere. In this article, we present a longitudinal study of several waves of Emotet-based attacks that we observed in VMware’s customer telemetry. By analyzing Emotet’s software development life cycle, we were able to dissect how it quickly changes its command and control (C2) infrastructure, obfuscates its configuration, adapts and tests its evasive execution chains, deploys different attack vectors at different stages, laterally propagates, and continues to evolve using numerous tactics and techniques.
{"title":"Keeping Up with the Emotets: Tracking a Multi-infrastructure Botnet","authors":"Oleg Boyarchuk, Sebastiano Mariani, Stefano Ortolani, G. Vigna","doi":"10.1145/3594554","DOIUrl":"https://doi.org/10.1145/3594554","url":null,"abstract":"Throughout its eight-year history, Emotet has caused substantial damage. This threat reappeared at the beginning of 2022 following a take-down by law enforcement in November 2021. Emotet is arguably one of the most notorious advanced persistent threats, causing substantial damage during its earlier phases and continuing to pose a danger to organizations everywhere. In this article, we present a longitudinal study of several waves of Emotet-based attacks that we observed in VMware’s customer telemetry. By analyzing Emotet’s software development life cycle, we were able to dissect how it quickly changes its command and control (C2) infrastructure, obfuscates its configuration, adapts and tests its evasive execution chains, deploys different attack vectors at different stages, laterally propagates, and continues to evolve using numerous tactics and techniques.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130468258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Implemented well, Zero Trust Architecture (ZTA) promises to mitigate cyber risk for organizations of all sizes, risk postures, and cybersecurity maturity states. However, ZTA development, deployment, and operation present challenges that may hinder full adoption and sustained effectiveness and create new risk. Cyber risk should be evaluated by organizations as they make their decision for or against ZTA. Then, as organizations work toward
{"title":"Zero Trust Architecture: Risk Discussion","authors":"Alan Levine, B. Tucker","doi":"10.1145/3573892","DOIUrl":"https://doi.org/10.1145/3573892","url":null,"abstract":"Implemented well, Zero Trust Architecture (ZTA) promises to mitigate cyber risk for organizations of all sizes, risk postures, and cybersecurity maturity states. However, ZTA development, deployment, and operation present challenges that may hinder full adoption and sustained effectiveness and create new risk. Cyber risk should be evaluated by organizations as they make their decision for or against ZTA. Then, as organizations work toward","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"475 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133472776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bug bounty programmes and vulnerability disclosure programmes, collectively referred to as Coordinated Vulnerability Disclosure (CVD) programmes, open up an organisation’s assets to the inquisitive gaze of (often eager) white-hat hackers. Motivated by the question What information do organisations convey to hackers through public CVD policy documents?, we aim to better understand the information available to hackers wishing to participate in the search for vulnerabilities. As such, in this article we consider three key issues. First, to address the differences in the legal language communicated to hackers, it is necessary to understand the formal constraints by which hackers must abide. Second, it is beneficial to understand the variation that exists in the informal constraints that are communicated to hackers through a variety of institutional elements. Third, for organisations wishing to better understand the commonplace elements that form current policy documents, we offer broad analysis of the components frequently included therein and identify gaps in programme policies. We report the results of a quantitative study, leveraging deep learning based natural language processing models, providing insights into the policy documents that accompany the CVD programmes of thousands of organisations, covering both stand-alone programmes and those hosted on 13 bug bounty programmes. We found that organisations often inadequately convey the formal constraints that are applicable to hackers, requiring hackers to have a deep understanding of the laws that underpin safe and legal security research. Furthermore, a lack of standardisation across similar policy components is prevalent, and may lead to a decreased understanding of the informal constraints placed upon hackers when searching for and disclosing vulnerabilities. Analysis of the institutional elements included in the policy documents of organisations reveals insufficient inclusion of many key components. Namely, legal information and information pertaining to restrictions on the backgrounds of hackers is found to be absent in a majority of policies analysed. Finally, to assist ongoing research, we provide novel annotated policy datasets that include human-labelled annotations at both the sentence and paragraph level, covering a broad range of CVD programme backgrounds.
{"title":"Towards a Greater Understanding of Coordinated Vulnerability Disclosure Policy Documents","authors":"T. Walshe, Andrew C. Simpson","doi":"10.1145/3586180","DOIUrl":"https://doi.org/10.1145/3586180","url":null,"abstract":"Bug bounty programmes and vulnerability disclosure programmes, collectively referred to as Coordinated Vulnerability Disclosure (CVD) programmes, open up an organisation’s assets to the inquisitive gaze of (often eager) white-hat hackers. Motivated by the question What information do organisations convey to hackers through public CVD policy documents?, we aim to better understand the information available to hackers wishing to participate in the search for vulnerabilities. As such, in this article we consider three key issues. First, to address the differences in the legal language communicated to hackers, it is necessary to understand the formal constraints by which hackers must abide. Second, it is beneficial to understand the variation that exists in the informal constraints that are communicated to hackers through a variety of institutional elements. Third, for organisations wishing to better understand the commonplace elements that form current policy documents, we offer broad analysis of the components frequently included therein and identify gaps in programme policies. We report the results of a quantitative study, leveraging deep learning based natural language processing models, providing insights into the policy documents that accompany the CVD programmes of thousands of organisations, covering both stand-alone programmes and those hosted on 13 bug bounty programmes. We found that organisations often inadequately convey the formal constraints that are applicable to hackers, requiring hackers to have a deep understanding of the laws that underpin safe and legal security research. Furthermore, a lack of standardisation across similar policy components is prevalent, and may lead to a decreased understanding of the informal constraints placed upon hackers when searching for and disclosing vulnerabilities. Analysis of the institutional elements included in the policy documents of organisations reveals insufficient inclusion of many key components. Namely, legal information and information pertaining to restrictions on the backgrounds of hackers is found to be absent in a majority of policies analysed. Finally, to assist ongoing research, we provide novel annotated policy datasets that include human-labelled annotations at both the sentence and paragraph level, covering a broad range of CVD programme backgrounds.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133506323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thalita Scharr Rodrigues Pimenta, Fabrício Ceschin, A. Grégio
Thousands of malicious applications (apps) are daily created, modified with the aid of automation tools, and released on the World Wide Web. Several techniques have been applied over the years to identify whether an APK is malicious or not. The use of these techniques intends to identify unknown malware mainly by calculating the similarity of a sample with previously grouped, already known families of malicious apps. Thus, high rates of accuracy would enable several countermeasures: from further quick detection to the development of vaccines and aid for reverse engineering new variants. However, most of the literature consists of limited experiments—either short-term and offline or based exclusively on well-known malicious apps’ families. In this paper, we explore the use of malware phylogeny, a term borrowed from biology, consisting of the genealogical study of the relationship between elements and families. Also, we investigate the literature on clustering techniques applied to mobile malware classification and discuss how researchers have been setting up their experiments.
{"title":"ANDROIDGYNY: Reviewing clustering techniques for Android malware family classification","authors":"Thalita Scharr Rodrigues Pimenta, Fabrício Ceschin, A. Grégio","doi":"10.1145/3587471","DOIUrl":"https://doi.org/10.1145/3587471","url":null,"abstract":"Thousands of malicious applications (apps) are daily created, modified with the aid of automation tools, and released on the World Wide Web. Several techniques have been applied over the years to identify whether an APK is malicious or not. The use of these techniques intends to identify unknown malware mainly by calculating the similarity of a sample with previously grouped, already known families of malicious apps. Thus, high rates of accuracy would enable several countermeasures: from further quick detection to the development of vaccines and aid for reverse engineering new variants. However, most of the literature consists of limited experiments—either short-term and offline or based exclusively on well-known malicious apps’ families. In this paper, we explore the use of malware phylogeny, a term borrowed from biology, consisting of the genealogical study of the relationship between elements and families. Also, we investigate the literature on clustering techniques applied to mobile malware classification and discuss how researchers have been setting up their experiments.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115261482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Digital threats to computing and network security continue their relentless advance, with malware growing in sophistication and frequently targeting the lower levels of the computing stack. Recently, these threats have evolved and started to target hardware vulnerabilities. Such attacks are hard to detect at the higher abstraction levels and even harder to mitigate given the challenges of changing the hardware infrastructure. To be effective, defensive measures must also consider the physical effects of computing at lower levels of the hardware stack
{"title":"Introduction to the Special Issue on the Digital Threats of Hardware Security","authors":"Aydin Aysu, S. Graham","doi":"10.1145/3585011","DOIUrl":"https://doi.org/10.1145/3585011","url":null,"abstract":"Digital threats to computing and network security continue their relentless advance, with malware growing in sophistication and frequently targeting the lower levels of the computing stack. Recently, these threats have evolved and started to target hardware vulnerabilities. Such attacks are hard to detect at the higher abstraction levels and even harder to mitigate given the challenges of changing the hardware infrastructure. To be effective, defensive measures must also consider the physical effects of computing at lower levels of the hardware stack","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129452397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rajat Subhra Bhowmick, Rahul Indra, Isha Ganguli, Jayanta Paul, J. Sil
Captchas are used to prevent computer bots from launching spam attacks and automatically extracting data available in the websites. The government websites mostly contain sensitive data related to citizens and assets of the country, and the vulnerability to its captcha systems raises a major security challenge. The proposed work focuses on the real-time captcha systems used by the government websites of India and identifies the risks level. To effectively analyze its captcha security, we concentrate on the problem from an attacker’s perspective. From the viewpoint of an attacker, building an effective solver to breach the captcha security system from scratch with limited feature engineering knowledge of text and image processing is a challenge. Neural network models are useful in automated feature extraction, and a simple model can be trained with a minimum number of manually annotated real captchas. Along with popular text captchas, government websites of India use text instructions–based captchas. We analyze an effective neural network pipeline for solving text captchas. The text instructions captchas are relatively new, and the work provides novel end-to-end neural network architectures to break different types of text instructions captchas. The proposed models achieve more than 80% accuracy and on a desktop GPU has a maximum inference speed of 1.063 seconds. The study comes up with an ecosystem and procedure to rate the overall risk of a captcha system used on a website. We observe that concerning the importance of available information on these government websites, the effort required to solve the captcha systems by an attacker is alarming.
{"title":"Breaking Captcha System with Minimal Exertion through Deep Learning: Real-time Risk Assessment on Indian Government Websites","authors":"Rajat Subhra Bhowmick, Rahul Indra, Isha Ganguli, Jayanta Paul, J. Sil","doi":"10.1145/3584974","DOIUrl":"https://doi.org/10.1145/3584974","url":null,"abstract":"Captchas are used to prevent computer bots from launching spam attacks and automatically extracting data available in the websites. The government websites mostly contain sensitive data related to citizens and assets of the country, and the vulnerability to its captcha systems raises a major security challenge. The proposed work focuses on the real-time captcha systems used by the government websites of India and identifies the risks level. To effectively analyze its captcha security, we concentrate on the problem from an attacker’s perspective. From the viewpoint of an attacker, building an effective solver to breach the captcha security system from scratch with limited feature engineering knowledge of text and image processing is a challenge. Neural network models are useful in automated feature extraction, and a simple model can be trained with a minimum number of manually annotated real captchas. Along with popular text captchas, government websites of India use text instructions–based captchas. We analyze an effective neural network pipeline for solving text captchas. The text instructions captchas are relatively new, and the work provides novel end-to-end neural network architectures to break different types of text instructions captchas. The proposed models achieve more than 80% accuracy and on a desktop GPU has a maximum inference speed of 1.063 seconds. The study comes up with an ecosystem and procedure to rate the overall risk of a captcha system used on a website. We observe that concerning the importance of available information on these government websites, the effort required to solve the captcha systems by an attacker is alarming.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123964005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pierre Graux, Jean-François Lalande, Valérie Viet Triem Tong, Pierre Wilke
Analyzing Android applications is essential to review proprietary code and to understand malware behaviors. However, Android applications use obfuscation techniques to slow down this process. These obfuscation techniques are increasingly based on native code. In this article, we propose OATs’inside, a new analysis tool that focuses on high-level behaviors to circumvent native obfuscation techniques transparently. The targeted high-level behaviors are object-level behaviors, i.e., actions performed on Java objects (e.g., field accesses, method calls), regardless of whether these actions are performed using Java or native code. Our system uses a hybrid approach based on dynamic monitoring and trace-based symbolic execution to output control flow graphs (CFGs), 27 pages. for each method of the analyzed application. CFGs are composed of Java-like actions enriched with condition expressions and dataflows between actions, giving an understandable representation of any code, even those fully native. OATs’inside spares users the need to dive into low-level instructions, which are difficult to reverse engineer. We extensively compare OATs’inside functionalities against state-of-the-art tools to highlight the benefit when observing native operations. Our experiments are conducted on a real smartphone: We discuss the performance impact of OATs’inside, and we demonstrate its practical use on applications containing anti-debugging techniques provided by the OWASP foundation. We also evaluate the robustness of OATs’inside using obfuscated unit tests using the Tigress obfuscator.
{"title":"OATs’inside: Retrieving Object Behaviors From Native-based Obfuscated Android Applications","authors":"Pierre Graux, Jean-François Lalande, Valérie Viet Triem Tong, Pierre Wilke","doi":"10.1145/3584975","DOIUrl":"https://doi.org/10.1145/3584975","url":null,"abstract":"Analyzing Android applications is essential to review proprietary code and to understand malware behaviors. However, Android applications use obfuscation techniques to slow down this process. These obfuscation techniques are increasingly based on native code. In this article, we propose OATs’inside, a new analysis tool that focuses on high-level behaviors to circumvent native obfuscation techniques transparently. The targeted high-level behaviors are object-level behaviors, i.e., actions performed on Java objects (e.g., field accesses, method calls), regardless of whether these actions are performed using Java or native code. Our system uses a hybrid approach based on dynamic monitoring and trace-based symbolic execution to output control flow graphs (CFGs), 27 pages. for each method of the analyzed application. CFGs are composed of Java-like actions enriched with condition expressions and dataflows between actions, giving an understandable representation of any code, even those fully native. OATs’inside spares users the need to dive into low-level instructions, which are difficult to reverse engineer. We extensively compare OATs’inside functionalities against state-of-the-art tools to highlight the benefit when observing native operations. Our experiments are conducted on a real smartphone: We discuss the performance impact of OATs’inside, and we demonstrate its practical use on applications containing anti-debugging techniques provided by the OWASP foundation. We also evaluate the robustness of OATs’inside using obfuscated unit tests using the Tigress obfuscator.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127447872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}