A Multi-solution Study on GDPR AI-enabled Completeness Checking of DPAs

IF 3.5 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Empirical Software Engineering Pub Date : 2024-06-14 DOI:10.1007/s10664-024-10491-3
Muhammad Ilyas Azeem, Sallam Abualhaija
{"title":"A Multi-solution Study on GDPR AI-enabled Completeness Checking of DPAs","authors":"Muhammad Ilyas Azeem, Sallam Abualhaija","doi":"10.1007/s10664-024-10491-3","DOIUrl":null,"url":null,"abstract":"<p>Specifying legal requirements for software systems to ensure their compliance with the applicable regulations is a major concern of requirements engineering. Personal data which is collected by an organization is often shared with other organizations to perform certain processing activities. In such cases, the General Data Protection Regulation (GDPR) requires issuing a data processing agreement (DPA) which regulates the processing and further ensures that personal data remains protected. Violating GDPR can lead to huge fines reaching to billions of Euros. Software systems involving personal data processing must adhere to the legal obligations stipulated both at a general level in GDPR as well as the obligations outlined in DPAs highlighting specific business. In other words, a DPA is yet another source from which requirements engineers can elicit legal requirements. However, the DPA must be complete according to GDPR to ensure that the elicited requirements cover the complete set of obligations. Therefore, checking the completeness of DPAs is a prerequisite step towards developing a compliant system. Analyzing DPAs with respect to GDPR entirely manually is time consuming and requires adequate legal expertise. In this paper, we propose an automation strategy that addresses the completeness checking of DPAs against GDPR provisions as a text classification problem. Specifically, we pursue ten alternative solutions which are enabled by different technologies, namely traditional machine learning, deep learning, language modeling, and few-shot learning. The goal of our work is to empirically examine how these different technologies fare in the legal domain. We computed F<span>\\(_2\\)</span> score on a set of 30 real DPAs. Our evaluation shows that best-performing solutions yield F<span>\\(_2\\)</span> score of 86.7% and 89.7% are based on pre-trained BERT and RoBERTa language models. Our analysis further shows that other alternative solutions based on deep learning (e.g., BiLSTM) and few-shot learning (e.g., SetFit) can achieve comparable accuracy, yet are more efficient to develop.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":null,"pages":null},"PeriodicalIF":3.5000,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Empirical Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10664-024-10491-3","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Specifying legal requirements for software systems to ensure their compliance with the applicable regulations is a major concern of requirements engineering. Personal data which is collected by an organization is often shared with other organizations to perform certain processing activities. In such cases, the General Data Protection Regulation (GDPR) requires issuing a data processing agreement (DPA) which regulates the processing and further ensures that personal data remains protected. Violating GDPR can lead to huge fines reaching to billions of Euros. Software systems involving personal data processing must adhere to the legal obligations stipulated both at a general level in GDPR as well as the obligations outlined in DPAs highlighting specific business. In other words, a DPA is yet another source from which requirements engineers can elicit legal requirements. However, the DPA must be complete according to GDPR to ensure that the elicited requirements cover the complete set of obligations. Therefore, checking the completeness of DPAs is a prerequisite step towards developing a compliant system. Analyzing DPAs with respect to GDPR entirely manually is time consuming and requires adequate legal expertise. In this paper, we propose an automation strategy that addresses the completeness checking of DPAs against GDPR provisions as a text classification problem. Specifically, we pursue ten alternative solutions which are enabled by different technologies, namely traditional machine learning, deep learning, language modeling, and few-shot learning. The goal of our work is to empirically examine how these different technologies fare in the legal domain. We computed F\(_2\) score on a set of 30 real DPAs. Our evaluation shows that best-performing solutions yield F\(_2\) score of 86.7% and 89.7% are based on pre-trained BERT and RoBERTa language models. Our analysis further shows that other alternative solutions based on deep learning (e.g., BiLSTM) and few-shot learning (e.g., SetFit) can achieve comparable accuracy, yet are more efficient to develop.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
关于 GDPR 人工智能支持的 DPA 完整性检查的多方案研究
明确软件系统的法律要求,确保其符合适用的法规,是需求工程的一个主要关注点。组织收集的个人数据通常会与其他组织共享,以执行某些处理活动。在这种情况下,《一般数据保护条例》(GDPR)要求签发数据处理协议(DPA),对数据处理进行规范,并进一步确保个人数据受到保护。违反 GDPR 可导致高达数十亿欧元的巨额罚款。涉及个人数据处理的软件系统必须遵守 GDPR 中规定的一般法律义务,以及 DPA 中概述的针对特定业务的义务。换句话说,DPA 是需求工程师可以从中获得法律要求的另一个来源。然而,根据 GDPR,DPA 必须是完整的,以确保所激发的需求涵盖整套义务。因此,检查 DPA 的完整性是开发合规系统的前提步骤。完全手动分析 DPA 与 GDPR 的关系非常耗时,而且需要足够的法律专业知识。在本文中,我们提出了一种自动化策略,将根据 GDPR 条款对 DPA 进行完整性检查作为一个文本分类问题来解决。具体来说,我们采用了十种不同技术的替代解决方案,即传统机器学习、深度学习、语言建模和少量学习。我们工作的目标是通过实证研究这些不同技术在法律领域的应用情况。我们在一组 30 个真实的 DPA 上计算了 F\(_2\) 分数。我们的评估显示,基于预训练的 BERT 和 RoBERTa 语言模型,表现最好的解决方案的 F\(_2\) 分数分别为 86.7% 和 89.7%。我们的分析进一步表明,其他基于深度学习(如 BiLSTM)和少量学习(如 SetFit)的替代解决方案可以达到相当的准确率,但开发效率更高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Empirical Software Engineering
Empirical Software Engineering 工程技术-计算机:软件工程
CiteScore
8.50
自引率
12.20%
发文量
169
审稿时长
>12 weeks
期刊介绍: Empirical Software Engineering provides a forum for applied software engineering research with a strong empirical component, and a venue for publishing empirical results relevant to both researchers and practitioners. Empirical studies presented here usually involve the collection and analysis of data and experience that can be used to characterize, evaluate and reveal relationships between software development deliverables, practices, and technologies. Over time, it is expected that such empirical results will form a body of knowledge leading to widely accepted and well-formed theories. The journal also offers industrial experience reports detailing the application of software technologies - processes, methods, or tools - and their effectiveness in industrial settings. Empirical Software Engineering promotes the publication of industry-relevant research, to address the significant gap between research and practice.
期刊最新文献
An empirical study on developers’ shared conversations with ChatGPT in GitHub pull requests and issues Quality issues in machine learning software systems An empirical study of token-based micro commits Software product line testing: a systematic literature review Consensus task interaction trace recommender to guide developers’ software navigation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1