Pub Date : 2023-05-13DOI: https://dl.acm.org/doi/10.1145/3590152
Isabel Wagner
It is well known that most users do not read privacy policies but almost always tick the box to agree with them. While the length and readability of privacy policies have been well studied and many approaches for policy analysis based on natural language processing have been proposed, existing studies are limited in their depth and scope, often focusing on a small number of data practices at single point in time. In this article, we fill this gap by analyzing the 25-year history of privacy policies using machine learning and natural language processing and presenting a comprehensive analysis of policy contents. Specifically, we collect a large-scale longitudinal corpus of privacy policies from 1996 to 2021 and analyze their content in terms of the data practices they describe, the rights they grant to users, and the rights they reserve for their organizations. We pay particular attention to changes in response to recent privacy regulations such as the GDPR and CCPA. We observe some positive changes, such as reductions in data collection post-GDPR, but also a range of concerning data practices, such as widespread implicit data collection for which users have no meaningful choices or access rights. Our work is an important step toward making privacy policies machine readable on the user side, which would help users match their privacy preferences against the policies offered by web services.
{"title":"Privacy Policies across the Ages: Content of Privacy Policies 1996–2021","authors":"Isabel Wagner","doi":"https://dl.acm.org/doi/10.1145/3590152","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3590152","url":null,"abstract":"<p>It is well known that most users do not read privacy policies but almost always tick the box to agree with them. While the length and readability of privacy policies have been well studied and many approaches for policy analysis based on natural language processing have been proposed, existing studies are limited in their depth and scope, often focusing on a small number of data practices at single point in time. In this article, we fill this gap by analyzing the 25-year history of privacy policies using machine learning and natural language processing and presenting a comprehensive analysis of policy contents. Specifically, we collect a large-scale longitudinal corpus of privacy policies from 1996 to 2021 and analyze their content in terms of the data practices they describe, the rights they grant to users, and the rights they reserve for their organizations. We pay particular attention to changes in response to recent privacy regulations such as the GDPR and CCPA. We observe some positive changes, such as reductions in data collection post-GDPR, but also a range of concerning data practices, such as widespread implicit data collection for which users have no meaningful choices or access rights. Our work is an important step toward making privacy policies machine readable on the user side, which would help users match their privacy preferences against the policies offered by web services.</p>","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138540649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-13DOI: https://dl.acm.org/doi/10.1145/3588770
Tom Bolton, Tooska Dargahi, Sana Belguith, Carsten Maple
The use of voice-controlled virtual assistants (VAs) is significant, and user numbers increase every year. Extensive use of VAs has provided the large, cash-rich technology companies who sell them with another way of consuming users’ data, providing a lucrative revenue stream. Whilst these companies are legally obliged to treat users’ information “fairly and responsibly,” artificial intelligence techniques used to process data have become incredibly sophisticated, leading to users’ concerns that a lack of clarity is making it hard to understand the nature and scope of data collection and use.
There has been little work undertaken on a self-contained user awareness tool targeting VAs. PrivExtractor, a novel web-based awareness dashboard for VA users, intends to redress this imbalance of understanding between the data “processors” and the user. It aims to achieve this using the four largest VA vendors as a case study and providing a comparison function that examines the four companies’ privacy practices and their compliance with data protection law.
As a result of this research, we conclude that the companies studied are largely compliant with the law, as expected. However, the user remains disadvantaged due to the ineffectiveness of current data regulation that does not oblige the companies to fully and transparently disclose how and when they use, share, or profit from the data. Furthermore, the software tool developed during the research is, we believe, the first that is capable of a comparative analysis of VA privacy with a visual demonstration to increase ease of understanding for the user.
{"title":"PrivExtractor: Toward Redressing the Imbalance of Understanding between Virtual Assistant Users and Vendors","authors":"Tom Bolton, Tooska Dargahi, Sana Belguith, Carsten Maple","doi":"https://dl.acm.org/doi/10.1145/3588770","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3588770","url":null,"abstract":"<p>The use of voice-controlled virtual assistants (VAs) is significant, and user numbers increase every year. Extensive use of VAs has provided the large, cash-rich technology companies who sell them with another way of consuming users’ data, providing a lucrative revenue stream. Whilst these companies are legally obliged to treat users’ information “fairly and responsibly,” artificial intelligence techniques used to process data have become incredibly sophisticated, leading to users’ concerns that a lack of clarity is making it hard to understand the nature and scope of data collection and use.</p><p>There has been little work undertaken on a self-contained user awareness tool targeting VAs. PrivExtractor, a novel web-based awareness dashboard for VA users, intends to redress this imbalance of understanding between the data “processors” and the user. It aims to achieve this using the four largest VA vendors as a case study and providing a comparison function that examines the four companies’ privacy practices and their compliance with data protection law.</p><p>As a result of this research, we conclude that the companies studied are largely compliant with the law, as expected. However, the user remains disadvantaged due to the ineffectiveness of current data regulation that does not oblige the companies to fully and transparently disclose how and when they use, share, or profit from the data. Furthermore, the software tool developed during the research is, we believe, the first that is capable of a comparative analysis of VA privacy with a visual demonstration to increase ease of understanding for the user.</p>","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138540705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-15DOI: https://dl.acm.org/doi/10.1145/3585536
Sona Alex, Dhanaraj K. J., Deepthi P. P.
Adopting mobile healthcare network (MHN) services such as disease detection is fraught with concerns about the security and privacy of the entities involved and the resource restrictions at the Internet of Things (IoT) nodes. Hence, the essential requirements for disease detection services are to (i) produce accurate and fast disease detection without jeopardizing the privacy of health clouds and medical users and (ii) reduce the computational and transmission overhead (energy consumption) of the IoT devices while maintaining the privacy. For privacy preservation of widely used neural network– (NN) based disease detection, existing literature suggests either computationally heavy public key fully homomorphic encryption (FHE), or secure multiparty computation, with a large number of interactions. Hence, the existing privacy-preserving NN schemes are energy consuming and not suitable for resource-constrained IoT nodes in MHN. This work proposes a lightweight, fully homomorphic, symmetric key FHE scheme (SkFhe) to address the issues involved in implementing privacy-preserving NN. Based on SkFhe, widely used non-linear activation functions ReLU and Leaky ReLU are implemented over the encrypted domain. Furthermore, based on the proposed privacy-preserving linear transformation and non-linear activation functions, an energy-efficient, accurate, and privacy-preserving NN is proposed. The proposed scheme guarantees privacy preservation of the health cloud’s NN model and medical user’s data. The experimental analysis demonstrates that the proposed solution dramatically reduces the overhead in communication and computation at the user side compared to the existing schemes. Moreover, the improved energy efficiency at the user is accomplished with reduced diagnosis time without sacrificing classification accuracy.
{"title":"Energy Efficient and Secure Neural Network–based Disease Detection Framework for Mobile Healthcare Network","authors":"Sona Alex, Dhanaraj K. J., Deepthi P. P.","doi":"https://dl.acm.org/doi/10.1145/3585536","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3585536","url":null,"abstract":"<p>Adopting mobile healthcare network (MHN) services such as disease detection is fraught with concerns about the security and privacy of the entities involved and the resource restrictions at the Internet of Things (IoT) nodes. Hence, the essential requirements for disease detection services are to (i) produce accurate and fast disease detection without jeopardizing the privacy of health clouds and medical users and (ii) reduce the computational and transmission overhead (energy consumption) of the IoT devices while maintaining the privacy. For privacy preservation of widely used neural network– (NN) based disease detection, existing literature suggests either computationally heavy public key fully homomorphic encryption (FHE), or secure multiparty computation, with a large number of interactions. Hence, the existing privacy-preserving NN schemes are energy consuming and not suitable for resource-constrained IoT nodes in MHN. This work proposes a lightweight, fully homomorphic, symmetric key FHE scheme (SkFhe) to address the issues involved in implementing privacy-preserving NN. Based on SkFhe, widely used non-linear activation functions ReLU and Leaky ReLU are implemented over the encrypted domain. Furthermore, based on the proposed privacy-preserving linear transformation and non-linear activation functions, an energy-efficient, accurate, and privacy-preserving NN is proposed. The proposed scheme guarantees privacy preservation of the health cloud’s NN model and medical user’s data. The experimental analysis demonstrates that the proposed solution dramatically reduces the overhead in communication and computation at the user side compared to the existing schemes. Moreover, the improved energy efficiency at the user is accomplished with reduced diagnosis time without sacrificing classification accuracy.</p>","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138540663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-14DOI: https://dl.acm.org/doi/10.1145/3585386
Litao Li, Steven H. H. Ding, Yuan Tian, Benjamin C. M. Fung, Philippe Charland, Weihan Ou, Leo Song, Congwei Chen
Software vulnerabilities have been posing tremendous reliability threats to the general public as well as critical infrastructures, and there have been many studies aiming to detect and mitigate software defects at the binary level. Most of the standard practices leverage both static and dynamic analysis, which have several drawbacks like heavy manual workload and high complexity. Existing deep learning-based solutions not only suffer to capture the complex relationships among different variables from raw binary code but also lack the explainability required for humans to verify, evaluate, and patch the detected bugs.
We propose VulANalyzeR, a deep learning-based model, for automated binary vulnerability detection, Common Weakness Enumeration-type classification, and root cause analysis to enhance safety and security. VulANalyzeR features sequential and topological learning through recurrent units and graph convolution to simulate how a program is executed. The attention mechanism is integrated throughout the model, which shows how different instructions and the corresponding states contribute to the final classification. It also classifies the specific vulnerability type through multi-task learning as this not only provides further explanation but also allows faster patching for zero-day vulnerabilities. We show that VulANalyzeR achieves better performance for vulnerability detection over the state-of-the-art baselines. Additionally, a Common Vulnerability Exposure dataset is used to evaluate real complex vulnerabilities. We conduct case studies to show that VulANalyzeR is able to accurately identify the instructions and basic blocks that cause the vulnerability even without given any prior knowledge related to the locations during the training phase.
{"title":"VulANalyzeR: Explainable Binary Vulnerability Detection with Multi-task Learning and Attentional Graph Convolution","authors":"Litao Li, Steven H. H. Ding, Yuan Tian, Benjamin C. M. Fung, Philippe Charland, Weihan Ou, Leo Song, Congwei Chen","doi":"https://dl.acm.org/doi/10.1145/3585386","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3585386","url":null,"abstract":"<p>Software vulnerabilities have been posing tremendous reliability threats to the general public as well as critical infrastructures, and there have been many studies aiming to detect and mitigate software defects at the binary level. Most of the standard practices leverage both static and dynamic analysis, which have several drawbacks like heavy manual workload and high complexity. Existing deep learning-based solutions not only suffer to capture the complex relationships among different variables from raw binary code but also lack the explainability required for humans to verify, evaluate, and patch the detected bugs. </p><p>We propose VulANalyzeR, a deep learning-based model, for automated binary vulnerability detection, Common Weakness Enumeration-type classification, and root cause analysis to enhance safety and security. VulANalyzeR features sequential and topological learning through recurrent units and graph convolution to simulate how a program is executed. The attention mechanism is integrated throughout the model, which shows how different instructions and the corresponding states contribute to the final classification. It also classifies the specific vulnerability type through multi-task learning as this not only provides further explanation but also allows faster patching for zero-day vulnerabilities. We show that VulANalyzeR achieves better performance for vulnerability detection over the state-of-the-art baselines. Additionally, a Common Vulnerability Exposure dataset is used to evaluate real complex vulnerabilities. We conduct case studies to show that VulANalyzeR is able to accurately identify the instructions and basic blocks that cause the vulnerability even without given any prior knowledge related to the locations during the training phase.</p>","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138540655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-14DOI: https://dl.acm.org/doi/10.1145/3577020
Andreas V. Hess, Sebastian A. MÖdersheim, Achim D. Brucker
Communication networks like the Internet form a large distributed system where a huge number of components run in parallel, such as security protocols and distributed web applications. For what concerns security, it is obviously infeasible to verify them all at once as one monolithic entity; rather, one has to verify individual components in isolation.
While many typical components like TLS have been studied intensively, there exists much less research on analyzing and ensuring the security of the composition of security protocols. This is a problem since the composition of systems that are secure in isolation can easily be insecure. The main goal of compositionality is thus a theorem of the form: given a set of components that are already proved secure in isolation and that satisfy a number of easy-to-check conditions, then also their parallel composition is secure. Said conditions should of course also be realistic in practice, or better yet, already be satisfied for many existing components. Another benefit of compositionality is that when one would like to exchange a component with another one, all that is needed is the proof that the new component is secure in isolation and satisfies the composition conditions—without having to re-prove anything about the other components.
This article has three contributions over previous work in parallel compositionality. First, we extend the compositionality paradigm to stateful systems: while previous approaches work only for simple protocols that only have a local session state, our result supports participants who maintain long-term databases that can be shared