Petar Velickovic, N. Lane, S. Bhattacharya, A. Chieh, O. Bellahsen, M. Vegreville
{"title":"Scaling health analytics to millions without compromising privacy using deep distributed behavior models","authors":"Petar Velickovic, N. Lane, S. Bhattacharya, A. Chieh, O. Bellahsen, M. Vegreville","doi":"10.1145/3154862.3154873","DOIUrl":null,"url":null,"abstract":"People are naturally sensitive to the sharing of their health data collected by various connected consumer devices (e.g., smart scales, sleep trackers) with third parties. However, sharing this data to compute aggregate statistics and comparisons is a basic building block for a range of medical studies based on large-scale consumer devices; such studies have the potential to transform how we study disease and behavior. Furthermore, informing users as to how their health measurements and activities compare with friends, demographic peers and globally has been shown to be a powerful tool for behavior change and management in individuals. While experienced organizations can safely perform aggregate user health analysis, there is a significant need for new privacy-preserving mechanisms that enable people to engage in the same way even with untrusted third parties (e.g., small/recently established organizations). In this work, we propose a new approach to this problem grounded in the use of deep distributed behavior models. These are discriminative deep learning models that can approximate the calculation of various aggregate functions. Models are bootstrapped with training data from a modestly sized cohort and then distributed directly to personal devices to estimate, for example, how the user (perhaps in terms of daily step counts) ranks/compares to various demographics ranges (like age and sex). Critically, the user's own data now never has to leave the device. We validate this method using a 1.2M-user 22-month dataset that spans body-weight, sleep hours and step counts collected by devices from Nokia Digital Health - Withings. Experiments show our framework remains accurate for a range of commonly used statistical aggregate functions. This result opens a powerful new paradigm for privacy-preserving analytics under which user data largely remains on personal devices, overcoming a variety of potential privacy risks.","PeriodicalId":200810,"journal":{"name":"Proceedings of the 11th EAI International Conference on Pervasive Computing Technologies for Healthcare","volume":"58 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th EAI International Conference on Pervasive Computing Technologies for Healthcare","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3154862.3154873","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
People are naturally sensitive to the sharing of their health data collected by various connected consumer devices (e.g., smart scales, sleep trackers) with third parties. However, sharing this data to compute aggregate statistics and comparisons is a basic building block for a range of medical studies based on large-scale consumer devices; such studies have the potential to transform how we study disease and behavior. Furthermore, informing users as to how their health measurements and activities compare with friends, demographic peers and globally has been shown to be a powerful tool for behavior change and management in individuals. While experienced organizations can safely perform aggregate user health analysis, there is a significant need for new privacy-preserving mechanisms that enable people to engage in the same way even with untrusted third parties (e.g., small/recently established organizations). In this work, we propose a new approach to this problem grounded in the use of deep distributed behavior models. These are discriminative deep learning models that can approximate the calculation of various aggregate functions. Models are bootstrapped with training data from a modestly sized cohort and then distributed directly to personal devices to estimate, for example, how the user (perhaps in terms of daily step counts) ranks/compares to various demographics ranges (like age and sex). Critically, the user's own data now never has to leave the device. We validate this method using a 1.2M-user 22-month dataset that spans body-weight, sleep hours and step counts collected by devices from Nokia Digital Health - Withings. Experiments show our framework remains accurate for a range of commonly used statistical aggregate functions. This result opens a powerful new paradigm for privacy-preserving analytics under which user data largely remains on personal devices, overcoming a variety of potential privacy risks.