Elham Shamsinejad, Touraj Banirostam, Mir Mohsen Pedram, Amir Masoud Rahmani
{"title":"大数据中的匿名算法和方法综述","authors":"Elham Shamsinejad, Touraj Banirostam, Mir Mohsen Pedram, Amir Masoud Rahmani","doi":"10.1007/s40745-024-00557-w","DOIUrl":null,"url":null,"abstract":"<div><p>In the era of big data, with the increase in volume and complexity of data, the main challenge is how to use big data while preserving the privacy of users. This study was conducted with the aim of finding a solution to this challenge. In this study, we examined various data anonymization methods, including differential privacy, advanced encryption, and strong access controls. In addition, the operation, advantages, disadvantages, and use of these methods, the challenges of adapting these methods to big data, and possible solutions for them were also examined. Our results show that traditional data anonymization methods lack scalability, leading to privacy breaches and data loss. When faced with large volumes of data, these methods may not be able to fully process the data. Also, these methods may be ineffective against re-identification attacks, linkage attacks, and inference attacks. We introduced emerging methods that are capable of providing improved privacy with minimal data loss. These methods have scalability for big data. Finally, we examined future research works and raised important questions that can help improve existing algorithms or develop new methods, better manage the complexity and scale of unstructured data.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"253 - 279"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Review of Anonymization Algorithms and Methods in Big Data\",\"authors\":\"Elham Shamsinejad, Touraj Banirostam, Mir Mohsen Pedram, Amir Masoud Rahmani\",\"doi\":\"10.1007/s40745-024-00557-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In the era of big data, with the increase in volume and complexity of data, the main challenge is how to use big data while preserving the privacy of users. This study was conducted with the aim of finding a solution to this challenge. In this study, we examined various data anonymization methods, including differential privacy, advanced encryption, and strong access controls. In addition, the operation, advantages, disadvantages, and use of these methods, the challenges of adapting these methods to big data, and possible solutions for them were also examined. Our results show that traditional data anonymization methods lack scalability, leading to privacy breaches and data loss. When faced with large volumes of data, these methods may not be able to fully process the data. Also, these methods may be ineffective against re-identification attacks, linkage attacks, and inference attacks. We introduced emerging methods that are capable of providing improved privacy with minimal data loss. These methods have scalability for big data. Finally, we examined future research works and raised important questions that can help improve existing algorithms or develop new methods, better manage the complexity and scale of unstructured data.</p></div>\",\"PeriodicalId\":36280,\"journal\":{\"name\":\"Annals of Data Science\",\"volume\":\"12 1\",\"pages\":\"253 - 279\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of Data Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s40745-024-00557-w\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Decision Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Data Science","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s40745-024-00557-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Decision Sciences","Score":null,"Total":0}
A Review of Anonymization Algorithms and Methods in Big Data
In the era of big data, with the increase in volume and complexity of data, the main challenge is how to use big data while preserving the privacy of users. This study was conducted with the aim of finding a solution to this challenge. In this study, we examined various data anonymization methods, including differential privacy, advanced encryption, and strong access controls. In addition, the operation, advantages, disadvantages, and use of these methods, the challenges of adapting these methods to big data, and possible solutions for them were also examined. Our results show that traditional data anonymization methods lack scalability, leading to privacy breaches and data loss. When faced with large volumes of data, these methods may not be able to fully process the data. Also, these methods may be ineffective against re-identification attacks, linkage attacks, and inference attacks. We introduced emerging methods that are capable of providing improved privacy with minimal data loss. These methods have scalability for big data. Finally, we examined future research works and raised important questions that can help improve existing algorithms or develop new methods, better manage the complexity and scale of unstructured data.
期刊介绍:
Annals of Data Science (ADS) publishes cutting-edge research findings, experimental results and case studies of data science. Although Data Science is regarded as an interdisciplinary field of using mathematics, statistics, databases, data mining, high-performance computing, knowledge management and virtualization to discover knowledge from Big Data, it should have its own scientific contents, such as axioms, laws and rules, which are fundamentally important for experts in different fields to explore their own interests from Big Data. ADS encourages contributors to address such challenging problems at this exchange platform. At present, how to discover knowledge from heterogeneous data under Big Data environment needs to be addressed. ADS is a series of volumes edited by either the editorial office or guest editors. Guest editors will be responsible for call-for-papers and the review process for high-quality contributions in their volumes.