{"title":"MarvelHideDroid: Reliable on-the-fly data anonymization based on Android virtualization","authors":"Francesco Pagano , Luca Verderame , Enrico Russo , Alessio Merlo","doi":"10.1016/j.compeleceng.2024.109882","DOIUrl":null,"url":null,"abstract":"<div><div>Modern mobile applications harvest many user-generated events during execution using proper libraries called <em>analytic libraries</em>. The collection of such events allows the app developers to acquire helpful information to further improve the app. The same collected events are likewise an essential source of information for analytic library providers (e.g., Google and Meta) to understand users’ preferences. However, the user is not involved in this process. To counteract this problem, some proposals arose from legal (e.g., General Data Protection Regulation (GDPR)) and research perspectives. Concerning the latter point, some research efforts led to the definition of solutions for the Android ecosystem that allow one to limit the gathering of such data before the analytic libraries collect it or give the user control of the process. To this aim, <em>HideDroid</em> was the first proposal to allow the user to define different privacy levels for each app installed on the device by leveraging k-anonymity and differential privacy techniques. Subsequently, <em>VirtualHideDroid</em> extended HideDroid by taking advantage of the same approach to virtualized Android environments, in which an application (plugin) can run within another application (container). In this scenario, VirtualHideDroid anonymizes user event data running as the container app. However, according to standard threat models regarding virtualized Android environments, assuming that the container app is fully trusted is too optimistic in real deployments.</div><div>For this reason, in this paper, we extend the work of the original VirtualHideDroid work by assuming that the same tool may be untrusted, i.e., controlled by an external attacker that has access to the container app, thereby having full access to the user data. To solve this problem, we define a new approach, named <em>MarvelHideDroid</em>, which gives reliable anonymization of event data in the Plugin app, even in the event of a malicious/compromised container. Moreover, and differently from VirtualHideDroid, <em>MarvelHideDroid</em> relies on LLM to automatically build up the generalizations required by k-anonymity, resulting in an anonymization strategy that is more reliable against modification in the data structure of the events captured by the analytic libraries. We empirically demonstrate the viability and reliability of the proposal by testing an implementation of <em>MarvelHideDroid</em> on a set of real Android apps in a virtualized environment.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"121 ","pages":"Article 109882"},"PeriodicalIF":4.0000,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790624008085","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Modern mobile applications harvest many user-generated events during execution using proper libraries called analytic libraries. The collection of such events allows the app developers to acquire helpful information to further improve the app. The same collected events are likewise an essential source of information for analytic library providers (e.g., Google and Meta) to understand users’ preferences. However, the user is not involved in this process. To counteract this problem, some proposals arose from legal (e.g., General Data Protection Regulation (GDPR)) and research perspectives. Concerning the latter point, some research efforts led to the definition of solutions for the Android ecosystem that allow one to limit the gathering of such data before the analytic libraries collect it or give the user control of the process. To this aim, HideDroid was the first proposal to allow the user to define different privacy levels for each app installed on the device by leveraging k-anonymity and differential privacy techniques. Subsequently, VirtualHideDroid extended HideDroid by taking advantage of the same approach to virtualized Android environments, in which an application (plugin) can run within another application (container). In this scenario, VirtualHideDroid anonymizes user event data running as the container app. However, according to standard threat models regarding virtualized Android environments, assuming that the container app is fully trusted is too optimistic in real deployments.
For this reason, in this paper, we extend the work of the original VirtualHideDroid work by assuming that the same tool may be untrusted, i.e., controlled by an external attacker that has access to the container app, thereby having full access to the user data. To solve this problem, we define a new approach, named MarvelHideDroid, which gives reliable anonymization of event data in the Plugin app, even in the event of a malicious/compromised container. Moreover, and differently from VirtualHideDroid, MarvelHideDroid relies on LLM to automatically build up the generalizations required by k-anonymity, resulting in an anonymization strategy that is more reliable against modification in the data structure of the events captured by the analytic libraries. We empirically demonstrate the viability and reliability of the proposal by testing an implementation of MarvelHideDroid on a set of real Android apps in a virtualized environment.
期刊介绍:
The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency.
Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.