Haoran Lu, Qingchuan Zhao, Yongliang Chen, Xiaojing Liao, Zhiqiang Lin
{"title":"Detecting and Measuring Aggressive Location Harvesting in Mobile Apps via Data-flow Path Embedding","authors":"Haoran Lu, Qingchuan Zhao, Yongliang Chen, Xiaojing Liao, Zhiqiang Lin","doi":"10.1145/3606376.3593535","DOIUrl":null,"url":null,"abstract":"Today, location-based services have become prevalent in the mobile platform, where mobile apps provide specific services to a user based on his or her location. Unfortunately, mobile apps can aggressively harvest location data with much higher accuracy and frequency than they need because the coarse-grained access control mechanism currently implemented in mobile operating systems (e.g., Android) cannot regulate such behavior. This unnecessary data collection violates the data minimization policy, yet no previous studies have investigated privacy violations from this perspective, and existing techniques are insufficient to address this violation. To fill this knowledge gap, we take the first step toward detecting and measuring this privacy risk in mobile apps at scale. Particularly, we annotate and release the first dataset to characterize those aggressive location harvesting apps and understand the challenges of automatic detection and classification. Next, we present a novel system, LocationScope, to address these challenges by (i) uncovering how an app collects locations and how to use such data through a fine-tuned value set analysis technique, (ii) recognizing the fine-grained location-based services an app provides via embedding data-flow paths, which is a combination of program analysis and machine learning techniques, extracted from its location data usages, and (iii) identifying aggressive apps with an outlier detection technique achieving a precision of 97% in aggressive app detection. Our technique has further been applied to millions of free Android apps from Google Play as of 2019 and 2021. Highlights of our measurements on detected aggressive apps include their growing trend from 2019 to 2021 and the app generators' significant contribution of aggressive location harvesting apps.","PeriodicalId":35745,"journal":{"name":"Performance Evaluation Review","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Performance Evaluation Review","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3606376.3593535","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0
Abstract
Today, location-based services have become prevalent in the mobile platform, where mobile apps provide specific services to a user based on his or her location. Unfortunately, mobile apps can aggressively harvest location data with much higher accuracy and frequency than they need because the coarse-grained access control mechanism currently implemented in mobile operating systems (e.g., Android) cannot regulate such behavior. This unnecessary data collection violates the data minimization policy, yet no previous studies have investigated privacy violations from this perspective, and existing techniques are insufficient to address this violation. To fill this knowledge gap, we take the first step toward detecting and measuring this privacy risk in mobile apps at scale. Particularly, we annotate and release the first dataset to characterize those aggressive location harvesting apps and understand the challenges of automatic detection and classification. Next, we present a novel system, LocationScope, to address these challenges by (i) uncovering how an app collects locations and how to use such data through a fine-tuned value set analysis technique, (ii) recognizing the fine-grained location-based services an app provides via embedding data-flow paths, which is a combination of program analysis and machine learning techniques, extracted from its location data usages, and (iii) identifying aggressive apps with an outlier detection technique achieving a precision of 97% in aggressive app detection. Our technique has further been applied to millions of free Android apps from Google Play as of 2019 and 2021. Highlights of our measurements on detected aggressive apps include their growing trend from 2019 to 2021 and the app generators' significant contribution of aggressive location harvesting apps.