Developing a Global Data Breach Database and the Challenges Encountered

Nelson Novaes Neto, S. Madnick, A. Paula, Natasha Malara Borges
{"title":"Developing a Global Data Breach Database and the Challenges Encountered","authors":"Nelson Novaes Neto, S. Madnick, A. Paula, Natasha Malara Borges","doi":"10.1145/3439873","DOIUrl":null,"url":null,"abstract":"If the mantra “data is the new oil” of our digital economy is correct, then data leak incidents are the critical disasters in the online society. The initial goal of our research was to present a comprehensive database of data breaches of personal information that took place in 2018 and 2019. This information was to be drawn from press reports, industry studies, and reports from regulatory agencies across the world. This article identified the top 430 largest data breach incidents among more than 10,000 data breach incidents. In the process, we encountered many complications, especially regarding the lack of standardization of reporting. This article should be especially interesting to the readers of JDIQ because it describes both the range of data quality and consistency issues found as well as what was learned from the database created. The database that was created, available at https://www.databreachdb.com, shows that the number of data records breached in those top 430 incidents increased from around 4B in 2018 to more than 22B in 2019. This increase occurred despite the strong efforts from regulatory agencies across the world to enforce strict rules on data protection and privacy, such as the General Data Protection Regulation (GDPR) that went into effect in Europe in May 2018. Such regulatory effort could explain the reason why there is such a large number of data breach cases reported in the European Union when compared to the U.S. (more than 10,000 data breaches publicly reported in the U.S. since 2018, while the EU reported more than 160,0001 data breaches since May 2018). However, we still face the problem of an excessive number of breach incidents around the world. This research helps to understand the challenges of proper visibility of such incidents on a global scale. The results of this research can help government entities, regulatory bodies, security and data quality researchers, companies, and managers to improve the data quality of data breach reporting and increase the visibility of the data breach landscape around the world in the future.","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"64 1","pages":"1 - 33"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Data and Information Quality (JDIQ)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3439873","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

Abstract

If the mantra “data is the new oil” of our digital economy is correct, then data leak incidents are the critical disasters in the online society. The initial goal of our research was to present a comprehensive database of data breaches of personal information that took place in 2018 and 2019. This information was to be drawn from press reports, industry studies, and reports from regulatory agencies across the world. This article identified the top 430 largest data breach incidents among more than 10,000 data breach incidents. In the process, we encountered many complications, especially regarding the lack of standardization of reporting. This article should be especially interesting to the readers of JDIQ because it describes both the range of data quality and consistency issues found as well as what was learned from the database created. The database that was created, available at https://www.databreachdb.com, shows that the number of data records breached in those top 430 incidents increased from around 4B in 2018 to more than 22B in 2019. This increase occurred despite the strong efforts from regulatory agencies across the world to enforce strict rules on data protection and privacy, such as the General Data Protection Regulation (GDPR) that went into effect in Europe in May 2018. Such regulatory effort could explain the reason why there is such a large number of data breach cases reported in the European Union when compared to the U.S. (more than 10,000 data breaches publicly reported in the U.S. since 2018, while the EU reported more than 160,0001 data breaches since May 2018). However, we still face the problem of an excessive number of breach incidents around the world. This research helps to understand the challenges of proper visibility of such incidents on a global scale. The results of this research can help government entities, regulatory bodies, security and data quality researchers, companies, and managers to improve the data quality of data breach reporting and increase the visibility of the data breach landscape around the world in the future.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
开发一个全球数据泄露数据库和遇到的挑战
如果“数据是数字经济的新石油”这句话是正确的,那么数据泄露事件就是网络社会的重大灾难。我们研究的最初目标是提供一个2018年和2019年发生的个人信息数据泄露的综合数据库。这些信息将从新闻报道、行业研究和世界各地监管机构的报告中提取。本文在10,000多起数据泄露事件中确定了430起最大的数据泄露事件。在这个过程中,我们遇到了很多困难,特别是在报告缺乏标准化方面。对于JDIQ的读者来说,这篇文章应该特别有趣,因为它描述了所发现的数据质量和一致性问题的范围,以及从创建的数据库中学到的东西。在https://www.databreachdb.com上创建的数据库显示,在这430起事件中,泄露的数据记录数量从2018年的约40万条增加到2019年的逾220万条。尽管世界各地的监管机构都在努力执行严格的数据保护和隐私规则,例如2018年5月在欧洲生效的《通用数据保护条例》(GDPR),但仍出现了这一增长。这种监管努力可以解释为什么与美国相比,欧盟报告的数据泄露案件数量如此之多(自2018年以来,美国公开报告的数据泄露事件超过1万起,而欧盟自2018年5月以来报告的数据泄露事件超过160,000起)。然而,我们仍然面临着全球范围内数据泄露事件过多的问题。这项研究有助于理解在全球范围内对此类事件进行适当的可见性所面临的挑战。本研究的结果可以帮助政府机构、监管机构、安全和数据质量研究人员、公司和管理人员提高数据泄露报告的数据质量,并提高未来全球数据泄露形势的可见性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Editorial: Special Issue on Data Transparency—Data Quality, Annotation, and Provenance Challenge Paper: The Vision for Time Profiled Temporal Association Mining Editorial: Special Issue on Quality Assessment and Management in Big Data—Part I Developing a Global Data Breach Database and the Challenges Encountered Knowledge Transfer for Entity Resolution with Siamese Neural Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1