Pub Date : 2021-11-20DOI: 10.2478/popets-2022-0002
Jacob Leon Kröger, L. Gellrich, Sebastian Pape, S.R. Brause, Stefan Ullrich
Abstract Through voice characteristics and manner of expression, even seemingly benign voice recordings can reveal sensitive attributes about a recorded speaker (e. g., geographical origin, health status, personality). We conducted a nationally representative survey in the UK (n = 683, 18–69 years) to investigate people’s awareness about the inferential power of voice and speech analysis. Our results show that – while awareness levels vary between different categories of inferred information – there is generally low awareness across all participant demographics, even among participants with professional experience in computer science, data mining, and IT security. For instance, only 18.7% of participants are at least somewhat aware that physical and mental health information can be inferred from voice recordings. Many participants have rarely (28.4%) or never (42.5%) even thought about the possibility of personal information being inferred from speech data. After a short educational video on the topic, participants express only moderate privacy concern. However, based on an analysis of open text responses, unconcerned reactions seem to be largely explained by knowledge gaps about possible data misuses. Watching the educational video lowered participants’ intention to use voice-enabled devices. In discussing the regulatory implications of our findings, we challenge the notion of “informed consent” to data processing. We also argue that inferences about individuals need to be legally recognized as personal data and protected accordingly.
{"title":"Personal information inference from voice recordings: User awareness and privacy concerns","authors":"Jacob Leon Kröger, L. Gellrich, Sebastian Pape, S.R. Brause, Stefan Ullrich","doi":"10.2478/popets-2022-0002","DOIUrl":"https://doi.org/10.2478/popets-2022-0002","url":null,"abstract":"Abstract Through voice characteristics and manner of expression, even seemingly benign voice recordings can reveal sensitive attributes about a recorded speaker (e. g., geographical origin, health status, personality). We conducted a nationally representative survey in the UK (n = 683, 18–69 years) to investigate people’s awareness about the inferential power of voice and speech analysis. Our results show that – while awareness levels vary between different categories of inferred information – there is generally low awareness across all participant demographics, even among participants with professional experience in computer science, data mining, and IT security. For instance, only 18.7% of participants are at least somewhat aware that physical and mental health information can be inferred from voice recordings. Many participants have rarely (28.4%) or never (42.5%) even thought about the possibility of personal information being inferred from speech data. After a short educational video on the topic, participants express only moderate privacy concern. However, based on an analysis of open text responses, unconcerned reactions seem to be largely explained by knowledge gaps about possible data misuses. Watching the educational video lowered participants’ intention to use voice-enabled devices. In discussing the regulatory implications of our findings, we challenge the notion of “informed consent” to data processing. We also argue that inferences about individuals need to be legally recognized as personal data and protected accordingly.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 1","pages":"6 - 27"},"PeriodicalIF":0.0,"publicationDate":"2021-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42251717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-20DOI: 10.2478/popets-2022-0004
Mingyu Liang, Ioanna Karantaidou, Foteini Baldimtsi, S. D. Gordon, Mayank Varia
Abstract We propose a new theoretical approach for building anonymous mixing mechanisms for cryptocurrencies. Rather than requiring a fully uniform permutation during mixing, we relax the requirement, insisting only that neighboring permutations are similarly likely. This is defined formally by borrowing from the definition of differential privacy. This relaxed privacy definition allows us to greatly reduce the amount of interaction and computation in the mixing protocol. Our construction achieves O(n·polylog(n)) computation time for mixing n addresses, whereas all other mixing schemes require O(n2) total computation across all parties. Additionally, we support a smooth tolerance of fail-stop adversaries and do not require any trusted setup. We analyze the security of our generic protocol under the UC framework, and under a stand-alone, game-based definition. We finally describe an instantiation using ring signatures and confidential transactions.
{"title":"(∈, δ)-Indistinguishable Mixing for Cryptocurrencies","authors":"Mingyu Liang, Ioanna Karantaidou, Foteini Baldimtsi, S. D. Gordon, Mayank Varia","doi":"10.2478/popets-2022-0004","DOIUrl":"https://doi.org/10.2478/popets-2022-0004","url":null,"abstract":"Abstract We propose a new theoretical approach for building anonymous mixing mechanisms for cryptocurrencies. Rather than requiring a fully uniform permutation during mixing, we relax the requirement, insisting only that neighboring permutations are similarly likely. This is defined formally by borrowing from the definition of differential privacy. This relaxed privacy definition allows us to greatly reduce the amount of interaction and computation in the mixing protocol. Our construction achieves O(n·polylog(n)) computation time for mixing n addresses, whereas all other mixing schemes require O(n2) total computation across all parties. Additionally, we support a smooth tolerance of fail-stop adversaries and do not require any trusted setup. We analyze the security of our generic protocol under the UC framework, and under a stand-alone, game-based definition. We finally describe an instantiation using ring signatures and confidential transactions.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 1","pages":"49 - 74"},"PeriodicalIF":0.0,"publicationDate":"2021-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49460475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-20DOI: 10.2478/popets-2022-0012
Darion Cassel, Su-Chin Lin, Alessio Buraggina, William Wang, Andrew Zhang, Lujo Bauer, H. Hsiao, Limin Jia, Timothy Libert
Abstract Over half of all visits to websites now take place in a mobile browser, yet the majority of web privacy studies take the vantage point of desktop browsers, use emulated mobile browsers, or focus on just a single mobile browser instead. In this paper, we present a comprehensive web-tracking measurement study on mobile browsers and privacy-focused mobile browsers. Our study leverages a new web measurement infrastructure, OmniCrawl, which we develop to drive browsers on desktop computers and smartphones located on two continents. We capture web tracking measurements using 42 different non-emulated browsers simultaneously. We find that the third-party advertising and tracking ecosystem of mobile browsers is more similar to that of desktop browsers than previous findings suggested. We study privacy-focused browsers and find their protections differ significantly and in general are less for lower-ranked sites. Our findings also show that common methodological choices made by web measurement studies, such as the use of emulated mobile browsers and Selenium, can lead to website behavior that deviates from what actual users experience.
{"title":"OmniCrawl: Comprehensive Measurement of Web Tracking With Real Desktop and Mobile Browsers","authors":"Darion Cassel, Su-Chin Lin, Alessio Buraggina, William Wang, Andrew Zhang, Lujo Bauer, H. Hsiao, Limin Jia, Timothy Libert","doi":"10.2478/popets-2022-0012","DOIUrl":"https://doi.org/10.2478/popets-2022-0012","url":null,"abstract":"Abstract Over half of all visits to websites now take place in a mobile browser, yet the majority of web privacy studies take the vantage point of desktop browsers, use emulated mobile browsers, or focus on just a single mobile browser instead. In this paper, we present a comprehensive web-tracking measurement study on mobile browsers and privacy-focused mobile browsers. Our study leverages a new web measurement infrastructure, OmniCrawl, which we develop to drive browsers on desktop computers and smartphones located on two continents. We capture web tracking measurements using 42 different non-emulated browsers simultaneously. We find that the third-party advertising and tracking ecosystem of mobile browsers is more similar to that of desktop browsers than previous findings suggested. We study privacy-focused browsers and find their protections differ significantly and in general are less for lower-ranked sites. Our findings also show that common methodological choices made by web measurement studies, such as the use of emulated mobile browsers and Selenium, can lead to website behavior that deviates from what actual users experience.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 1","pages":"227 - 252"},"PeriodicalIF":0.0,"publicationDate":"2021-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48612817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-20DOI: 10.2478/popets-2022-0010
E. A. Qahtani, Yousra Javed, Mohamed Shehab
Abstract Gmail’s confidential mode enables a user to send confidential emails and control access to their content through setting an expiration time and passcode, pre-expiry access revocation, and prevention of email forwarding, downloading, and printing. This paper aims to understand user perceptions and motivations for using Gmail’s confidential mode (GCM). Our structured interviews with 19 Gmail users at UNC Charlotte show that users utilize this mode to share their private documents with recipients and perceive that this mode encrypts their emails and attachments. The most commonly used feature of this mode is the default time expiration of one week, and the least used feature is the pre-expiry access revocation. Our analysis suggests several design improvements.
{"title":"User Perceptions of Gmail’s Confidential Mode","authors":"E. A. Qahtani, Yousra Javed, Mohamed Shehab","doi":"10.2478/popets-2022-0010","DOIUrl":"https://doi.org/10.2478/popets-2022-0010","url":null,"abstract":"Abstract Gmail’s confidential mode enables a user to send confidential emails and control access to their content through setting an expiration time and passcode, pre-expiry access revocation, and prevention of email forwarding, downloading, and printing. This paper aims to understand user perceptions and motivations for using Gmail’s confidential mode (GCM). Our structured interviews with 19 Gmail users at UNC Charlotte show that users utilize this mode to share their private documents with recipients and perceive that this mode encrypts their emails and attachments. The most commonly used feature of this mode is the default time expiration of one week, and the least used feature is the pre-expiry access revocation. Our analysis suggests several design improvements.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 1","pages":"187 - 206"},"PeriodicalIF":0.0,"publicationDate":"2021-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42215257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract A membership inference attack (MIA) poses privacy risks for the training data of a machine learning model. With an MIA, an attacker guesses if the target data are a member of the training dataset. The state-of-the-art defense against MIAs, distillation for membership privacy (DMP), requires not only private data for protection but a large amount of unlabeled public data. However, in certain privacy-sensitive domains, such as medicine and finance, the availability of public data is not guaranteed. Moreover, a trivial method for generating public data by using generative adversarial networks significantly decreases the model accuracy, as reported by the authors of DMP. To overcome this problem, we propose a novel defense against MIAs that uses knowledge distillation without requiring public data. Our experiments show that the privacy protection and accuracy of our defense are comparable to those of DMP for the benchmark tabular datasets used in MIA research, Purchase100 and Texas100, and our defense has a much better privacy-utility trade-off than those of the existing defenses that also do not use public data for the image dataset CIFAR10.
{"title":"Knowledge Cross-Distillation for Membership Privacy","authors":"R. Chourasia, Batnyam Enkhtaivan, Kunihiro Ito, Junki Mori, Isamu Teranishi, Hikaru Tsuchida","doi":"10.2478/popets-2022-0050","DOIUrl":"https://doi.org/10.2478/popets-2022-0050","url":null,"abstract":"Abstract A membership inference attack (MIA) poses privacy risks for the training data of a machine learning model. With an MIA, an attacker guesses if the target data are a member of the training dataset. The state-of-the-art defense against MIAs, distillation for membership privacy (DMP), requires not only private data for protection but a large amount of unlabeled public data. However, in certain privacy-sensitive domains, such as medicine and finance, the availability of public data is not guaranteed. Moreover, a trivial method for generating public data by using generative adversarial networks significantly decreases the model accuracy, as reported by the authors of DMP. To overcome this problem, we propose a novel defense against MIAs that uses knowledge distillation without requiring public data. Our experiments show that the privacy protection and accuracy of our defense are comparable to those of DMP for the benchmark tabular datasets used in MIA research, Purchase100 and Texas100, and our defense has a much better privacy-utility trade-off than those of the existing defenses that also do not use public data for the image dataset CIFAR10.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 1","pages":"362 - 377"},"PeriodicalIF":0.0,"publicationDate":"2021-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42799863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-16DOI: 10.2478/popets-2022-0011
Ruben Recabarren, Bogdan Carbunar
Abstract Providing unrestricted access to sensitive content such as news and software is difficult in the presence of adaptive and resourceful surveillance and censoring adversaries. In this paper we leverage the distributed and resilient nature of commercial Satoshi blockchains to develop the first provably secure, censorship resistant, cost-efficient storage system with anonymous and private access, built on top of commercial cryptocurrency transactions. We introduce max-rate transactions, a practical construct to persist data of arbitrary size entirely in a Satoshi blockchain. We leverage max-rate transactions to develop UWeb, a blockchain-based storage system that charges publishers to self-sustain its decentralized infrastructure. UWeb organizes blockchain-stored content for easy retrieval, and enables clients to store and access content with provable anonymity, privacy and censorship resistance properties. We present results from UWeb experiments with writing 268.21 MB of data into the live Litecoin blockchain, including 4.5 months of live-feed BBC articles, and 41 censorship resistant tools. The max-rate writing throughput (183 KB/s) and blockchain utilization (88%) exceed those of state-of-the-art solutions by 2-3 orders of magnitude and broke Litecoin’s record of the daily average block size. Our simulations with up to 3,000 concurrent UWeb writers confirm that UWeb does not impact the confirmation delays of financial transactions.
{"title":"Toward Uncensorable, Anonymous and Private Access Over Satoshi Blockchains","authors":"Ruben Recabarren, Bogdan Carbunar","doi":"10.2478/popets-2022-0011","DOIUrl":"https://doi.org/10.2478/popets-2022-0011","url":null,"abstract":"Abstract Providing unrestricted access to sensitive content such as news and software is difficult in the presence of adaptive and resourceful surveillance and censoring adversaries. In this paper we leverage the distributed and resilient nature of commercial Satoshi blockchains to develop the first provably secure, censorship resistant, cost-efficient storage system with anonymous and private access, built on top of commercial cryptocurrency transactions. We introduce max-rate transactions, a practical construct to persist data of arbitrary size entirely in a Satoshi blockchain. We leverage max-rate transactions to develop UWeb, a blockchain-based storage system that charges publishers to self-sustain its decentralized infrastructure. UWeb organizes blockchain-stored content for easy retrieval, and enables clients to store and access content with provable anonymity, privacy and censorship resistance properties. We present results from UWeb experiments with writing 268.21 MB of data into the live Litecoin blockchain, including 4.5 months of live-feed BBC articles, and 41 censorship resistant tools. The max-rate writing throughput (183 KB/s) and blockchain utilization (88%) exceed those of state-of-the-art solutions by 2-3 orders of magnitude and broke Litecoin’s record of the daily average block size. Our simulations with up to 3,000 concurrent UWeb writers confirm that UWeb does not impact the confirmation delays of financial transactions.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 1","pages":"207 - 226"},"PeriodicalIF":0.0,"publicationDate":"2021-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45992563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-28DOI: 10.2478/popets-2022-0033
Konrad Kollnig, A. Shuba, Reuben Binns, M. V. Kleek, N. Shadbolt
Abstract While many studies have looked at privacy properties of the Android and Google Play app ecosystem, comparatively much less is known about iOS and the Apple App Store, the most widely used ecosystem in the US. At the same time, there is increasing competition around privacy between these smartphone operating system providers. In this paper, we present a study of 24k Android and iOS apps from 2020 along several dimensions relating to user privacy. We find that third-party tracking and the sharing of unique user identifiers was widespread in apps from both ecosystems, even in apps aimed at children. In the children’s category, iOS apps tended to use fewer advertising-related tracking than their Android counterparts, but could more often access children’s location. Across all studied apps, our study highlights widespread potential violations of US, EU and UK privacy law, including 1) the use of third-party tracking without user consent, 2) the lack of parental consent before sharing personally identifiable information (PII) with third-parties in children’s apps, 3) the non-data-minimising configuration of tracking libraries, 4) the sending of personal data to countries without an adequate level of data protection, and 5) the continued absence of transparency around tracking, partly due to design decisions by Apple and Google. Overall, we find that neither platform is clearly better than the other for privacy across the dimensions we studied.
{"title":"Are iPhones Really Better for Privacy? A Comparative Study of iOS and Android Apps","authors":"Konrad Kollnig, A. Shuba, Reuben Binns, M. V. Kleek, N. Shadbolt","doi":"10.2478/popets-2022-0033","DOIUrl":"https://doi.org/10.2478/popets-2022-0033","url":null,"abstract":"Abstract While many studies have looked at privacy properties of the Android and Google Play app ecosystem, comparatively much less is known about iOS and the Apple App Store, the most widely used ecosystem in the US. At the same time, there is increasing competition around privacy between these smartphone operating system providers. In this paper, we present a study of 24k Android and iOS apps from 2020 along several dimensions relating to user privacy. We find that third-party tracking and the sharing of unique user identifiers was widespread in apps from both ecosystems, even in apps aimed at children. In the children’s category, iOS apps tended to use fewer advertising-related tracking than their Android counterparts, but could more often access children’s location. Across all studied apps, our study highlights widespread potential violations of US, EU and UK privacy law, including 1) the use of third-party tracking without user consent, 2) the lack of parental consent before sharing personally identifiable information (PII) with third-parties in children’s apps, 3) the non-data-minimising configuration of tracking libraries, 4) the sending of personal data to countries without an adequate level of data protection, and 5) the continued absence of transparency around tracking, partly due to design decisions by Apple and Google. Overall, we find that neither platform is clearly better than the other for privacy across the dimensions we studied.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 1","pages":"6 - 24"},"PeriodicalIF":0.0,"publicationDate":"2021-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47344369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-22DOI: 10.2478/popets-2022-0029
Maximilian Zinkus, Tushar M. Jois, M. Green
Abstract Mobile devices have become an indispensable component of modern life. Their high storage capacity gives these devices the capability to store vast amounts of sensitive personal data, which makes them a high-value target: these devices are routinely stolen by criminals for data theft, and are increasingly viewed by law enforcement agencies as a valuable source of forensic data. Over the past several years, providers have deployed a number of advanced cryptographic features intended to protect data on mobile devices, even in the strong setting where an attacker has physical access to a device. Many of these techniques draw from the research literature, but have been adapted to this entirely new problem setting. This involves a number of novel challenges, which are incompletely addressed in the literature. In this work, we outline those challenges, and systematize the known approaches to securing user data against extraction attacks. Our work proposes a methodology that researchers can use to analyze cryptographic data confidentiality for mobile devices. We evaluate the existing literature for securing devices against data extraction adversaries with powerful capabilities including access to devices and to the cloud services they rely on. We then analyze existing mobile device confidentiality measures to identify research areas that have not received proper attention from the community and represent opportunities for future research.
{"title":"SoK: Cryptographic Confidentiality of Data on Mobile Devices","authors":"Maximilian Zinkus, Tushar M. Jois, M. Green","doi":"10.2478/popets-2022-0029","DOIUrl":"https://doi.org/10.2478/popets-2022-0029","url":null,"abstract":"Abstract Mobile devices have become an indispensable component of modern life. Their high storage capacity gives these devices the capability to store vast amounts of sensitive personal data, which makes them a high-value target: these devices are routinely stolen by criminals for data theft, and are increasingly viewed by law enforcement agencies as a valuable source of forensic data. Over the past several years, providers have deployed a number of advanced cryptographic features intended to protect data on mobile devices, even in the strong setting where an attacker has physical access to a device. Many of these techniques draw from the research literature, but have been adapted to this entirely new problem setting. This involves a number of novel challenges, which are incompletely addressed in the literature. In this work, we outline those challenges, and systematize the known approaches to securing user data against extraction attacks. Our work proposes a methodology that researchers can use to analyze cryptographic data confidentiality for mobile devices. We evaluate the existing literature for securing devices against data extraction adversaries with powerful capabilities including access to devices and to the cloud services they rely on. We then analyze existing mobile device confidentiality measures to identify research areas that have not received proper attention from the community and represent opportunities for future research.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 1","pages":"586 - 607"},"PeriodicalIF":0.0,"publicationDate":"2021-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43161457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.2478/popets-2022-0013
Joshua Smith, H. Asghar, Gianpaolo Gioiosa, Sirine Mrabet, Serge Gaspers, P. Tyler
Abstract We show that the ‘optimal’ use of the parallel composition theorem corresponds to finding the size of the largest subset of queries that ‘overlap’ on the data domain, a quantity we call the maximum overlap of the queries. It has previously been shown that a certain instance of this problem, formulated in terms of determining the sensitivity of the queries, is NP-hard, but also that it is possible to use graph-theoretic algorithms, such as finding the maximum clique, to approximate query sensitivity. In this paper, we consider a significant generalization of the aforementioned instance which encompasses both a wider range of differentially private mechanisms and a broader class of queries. We show that for a particular class of predicate queries, determining if they are disjoint can be done in time polynomial in the number of attributes. For this class, we show that the maximum overlap problem remains NP-hard as a function of the number of queries. However, we show that efficient approximate solutions exist by relating maximum overlap to the clique and chromatic numbers of a certain graph determined by the queries. The link to chromatic number allows us to use more efficient approximate algorithms, which cannot be done for the clique number as it may underestimate the privacy budget. Our approach is defined in the general setting of f-differential privacy, which subsumes standard pure differential privacy and Gaussian differential privacy. We prove the parallel composition theorem for f-differential privacy. We evaluate our approach on synthetic and real-world data sets of queries. We show that the approach can scale to large domain sizes (up to 1020000), and that its application can reduce the noise added to query answers by up to 60%.
{"title":"Making the Most of Parallel Composition in Differential Privacy","authors":"Joshua Smith, H. Asghar, Gianpaolo Gioiosa, Sirine Mrabet, Serge Gaspers, P. Tyler","doi":"10.2478/popets-2022-0013","DOIUrl":"https://doi.org/10.2478/popets-2022-0013","url":null,"abstract":"Abstract We show that the ‘optimal’ use of the parallel composition theorem corresponds to finding the size of the largest subset of queries that ‘overlap’ on the data domain, a quantity we call the maximum overlap of the queries. It has previously been shown that a certain instance of this problem, formulated in terms of determining the sensitivity of the queries, is NP-hard, but also that it is possible to use graph-theoretic algorithms, such as finding the maximum clique, to approximate query sensitivity. In this paper, we consider a significant generalization of the aforementioned instance which encompasses both a wider range of differentially private mechanisms and a broader class of queries. We show that for a particular class of predicate queries, determining if they are disjoint can be done in time polynomial in the number of attributes. For this class, we show that the maximum overlap problem remains NP-hard as a function of the number of queries. However, we show that efficient approximate solutions exist by relating maximum overlap to the clique and chromatic numbers of a certain graph determined by the queries. The link to chromatic number allows us to use more efficient approximate algorithms, which cannot be done for the clique number as it may underestimate the privacy budget. Our approach is defined in the general setting of f-differential privacy, which subsumes standard pure differential privacy and Gaussian differential privacy. We prove the parallel composition theorem for f-differential privacy. We evaluate our approach on synthetic and real-world data sets of queries. We show that the approach can scale to large domain sizes (up to 1020000), and that its application can reduce the noise added to query answers by up to 60%.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 1","pages":"253 - 273"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47515001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}