{"title":"Precisely Extracting Complex Variable Values from Android Apps","authors":"Marc Miltenberger, Steven Arzt","doi":"10.1145/3649591","DOIUrl":null,"url":null,"abstract":"<p>Millions of users nowadays rely on their smartphones to process sensitive data through apps from various vendors and sources. Therefore, it is vital to assess these apps for security vulnerabilities and privacy violations. Information such as to which server an app connects through which protocol, and which algorithm it applies for encryption are usually encoded as variable values and arguments of API calls. However, extracting these values from an app is not trivial. The source code of an app is usually not available, and manual reverse engineering is cumbersome with binary sizes in the tens of megabytes. Current automated tools, on the other hand, cannot retrieve values that are computed at runtime through complex transformations. </p><p>In this paper, we present <span>ValDroid</span>, a novel static analysis tool for automatically extracting the set of possible values for a given variable at a given statement in the Dalvik byte code of an Android app. We evaluate <span>ValDroid</span>\nagainst existing approaches (JSA, Violist, DroidRA, Harvester, BlueSeal, StringHound, IC3, COAL) on benchmarks and 794 real-world apps. <span>ValDroid</span> greatly outperforms existing tools. It provides an average <i>F</i><sub>1</sub> score of more than 90%, while only requiring 0.1 seconds per value on average. For many data types including Network Connections and Dynamic Code Loading, its recall is more than twice the recall of the best existing approaches.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"5 1","pages":""},"PeriodicalIF":6.6000,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3649591","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Millions of users nowadays rely on their smartphones to process sensitive data through apps from various vendors and sources. Therefore, it is vital to assess these apps for security vulnerabilities and privacy violations. Information such as to which server an app connects through which protocol, and which algorithm it applies for encryption are usually encoded as variable values and arguments of API calls. However, extracting these values from an app is not trivial. The source code of an app is usually not available, and manual reverse engineering is cumbersome with binary sizes in the tens of megabytes. Current automated tools, on the other hand, cannot retrieve values that are computed at runtime through complex transformations.
In this paper, we present ValDroid, a novel static analysis tool for automatically extracting the set of possible values for a given variable at a given statement in the Dalvik byte code of an Android app. We evaluate ValDroid
against existing approaches (JSA, Violist, DroidRA, Harvester, BlueSeal, StringHound, IC3, COAL) on benchmarks and 794 real-world apps. ValDroid greatly outperforms existing tools. It provides an average F1 score of more than 90%, while only requiring 0.1 seconds per value on average. For many data types including Network Connections and Dynamic Code Loading, its recall is more than twice the recall of the best existing approaches.
期刊介绍:
Designing and building a large, complex software system is a tremendous challenge. ACM Transactions on Software Engineering and Methodology (TOSEM) publishes papers on all aspects of that challenge: specification, design, development and maintenance. It covers tools and methodologies, languages, data structures, and algorithms. TOSEM also reports on successful efforts, noting practical lessons that can be scaled and transferred to other projects, and often looks at applications of innovative technologies. The tone is scholarly but readable; the content is worthy of study; the presentation is effective.