Large language models: a new approach for privacy policy analysis at scale

IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS Computing Pub Date : 2024-08-22 DOI:10.1007/s00607-024-01331-9
David Rodriguez, Ian Yang, Jose M. Del Alamo, Norman Sadeh
{"title":"Large language models: a new approach for privacy policy analysis at scale","authors":"David Rodriguez, Ian Yang, Jose M. Del Alamo, Norman Sadeh","doi":"10.1007/s00607-024-01331-9","DOIUrl":null,"url":null,"abstract":"<p>The number and dynamic nature of web sites and mobile applications present regulators and app store operators with significant challenges when it comes to enforcing compliance with applicable privacy and data protection laws. Over the past several years, people have turned to Natural Language Processing (NLP) techniques to automate privacy compliance analysis (e.g., comparing statements in privacy policies with analysis of the code and behavior of mobile apps) and to answer people’s privacy questions. Traditionally, these NLP techniques have relied on labor-intensive and potentially error-prone manual annotation processes to build the corpora necessary to train them. This article explores and evaluates the use of Large Language Models (LLMs) as an alternative for effectively and efficiently identifying and categorizing a variety of data practice disclosures found in the text of privacy policies. Specifically, we report on the performance of ChatGPT and Llama 2, two particularly popular LLM-based tools. This includes engineering prompts and evaluating different configurations of these LLM techniques. Evaluation of the resulting techniques on well-known corpora of privacy policy annotations yields an F1 score exceeding 93%. This score is higher than scores reported earlier in the literature on these benchmarks. This performance is obtained at minimal marginal cost (excluding the cost required to train the foundational models themselves). These results, which are consistent with those reported in other domains, suggest that LLMs offer a particularly promising approach to automated privacy policy analysis at scale.</p>","PeriodicalId":10718,"journal":{"name":"Computing","volume":null,"pages":null},"PeriodicalIF":3.3000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00607-024-01331-9","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

The number and dynamic nature of web sites and mobile applications present regulators and app store operators with significant challenges when it comes to enforcing compliance with applicable privacy and data protection laws. Over the past several years, people have turned to Natural Language Processing (NLP) techniques to automate privacy compliance analysis (e.g., comparing statements in privacy policies with analysis of the code and behavior of mobile apps) and to answer people’s privacy questions. Traditionally, these NLP techniques have relied on labor-intensive and potentially error-prone manual annotation processes to build the corpora necessary to train them. This article explores and evaluates the use of Large Language Models (LLMs) as an alternative for effectively and efficiently identifying and categorizing a variety of data practice disclosures found in the text of privacy policies. Specifically, we report on the performance of ChatGPT and Llama 2, two particularly popular LLM-based tools. This includes engineering prompts and evaluating different configurations of these LLM techniques. Evaluation of the resulting techniques on well-known corpora of privacy policy annotations yields an F1 score exceeding 93%. This score is higher than scores reported earlier in the literature on these benchmarks. This performance is obtained at minimal marginal cost (excluding the cost required to train the foundational models themselves). These results, which are consistent with those reported in other domains, suggest that LLMs offer a particularly promising approach to automated privacy policy analysis at scale.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
大型语言模型:大规模隐私政策分析的新方法
网站和移动应用程序的数量和动态性质给监管机构和应用程序商店运营商在执行适用的隐私和数据保护法律时带来了巨大的挑战。在过去的几年里,人们开始使用自然语言处理 (NLP) 技术来自动进行隐私合规性分析(例如,将隐私政策中的声明与移动应用程序的代码和行为分析进行比较),并回答人们的隐私问题。传统上,这些 NLP 技术依赖于劳动密集型且容易出错的人工标注过程,以建立训练这些技术所需的语料库。本文探讨并评估了大型语言模型 (LLM) 的使用情况,将其作为一种替代方法,有效、高效地识别隐私政策文本中的各种数据实践披露并对其进行分类。具体来说,我们报告了 ChatGPT 和 Llama 2 这两个特别流行的基于 LLM 的工具的性能。这包括工程提示和评估这些 LLM 技术的不同配置。在著名的隐私政策注释语料库中对所产生的技术进行评估后,F1 分数超过 93%。这一分数高于早先文献中报道的这些基准的分数。这种性能是以最小的边际成本(不包括训练基础模型本身所需的成本)获得的。这些结果与其他领域报告的结果一致,表明 LLM 为大规模自动隐私政策分析提供了一种特别有前途的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computing
Computing 工程技术-计算机:理论方法
CiteScore
8.20
自引率
2.70%
发文量
107
审稿时长
3 months
期刊介绍: Computing publishes original papers, short communications and surveys on all fields of computing. The contributions should be written in English and may be of theoretical or applied nature, the essential criteria are computational relevance and systematic foundation of results.
期刊最新文献
Mapping and just-in-time traffic congestion mitigation for emergency vehicles in smart cities Fog intelligence for energy efficient management in smart street lamps Contextual authentication of users and devices using machine learning Multi-objective service composition optimization problem in IoT for agriculture 4.0 Robust evaluation of GPU compute instances for HPC and AI in the cloud: a TOPSIS approach with sensitivity, bootstrapping, and non-parametric analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1