David Jin, Niclas Kannengießer, Sascha Rank, Ali Sunyaev
Various collaborative distributed machine learning (CDML) systems, including federated learning systems and swarm learning systems, with different key traits were developed to leverage resources for the development and use of machine learning (ML) models in a confidentiality-preserving way. To meet use case requirements, suitable CDML systems need to be selected. However, comparison between CDML systems to assess their suitability for use cases is often difficult. To support comparison of CDML systems and introduce scientific and practical audiences to the principal functioning and key traits of CDML systems, this work presents a CDML system conceptualization and CDML archetypes.
{"title":"Collaborative Distributed Machine Learning","authors":"David Jin, Niclas Kannengießer, Sascha Rank, Ali Sunyaev","doi":"10.1145/3704807","DOIUrl":"https://doi.org/10.1145/3704807","url":null,"abstract":"Various collaborative distributed machine learning (CDML) systems, including federated learning systems and swarm learning systems, with different key traits were developed to leverage resources for the development and use of machine learning (ML) models in a confidentiality-preserving way. To meet use case requirements, suitable CDML systems need to be selected. However, comparison between CDML systems to assess their suitability for use cases is often difficult. To support comparison of CDML systems and introduce scientific and practical audiences to the principal functioning and key traits of CDML systems, this work presents a CDML system conceptualization and CDML archetypes.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"14 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142678439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bots are software systems designed to support users by automating specific processes, tasks, or activities. When these systems implement a conversational component to interact with users, they are also known as conversational agents or chatbots . Bots—particularly in their conversation-oriented version and AI-powered—have seen increased adoption over time for software development and engineering purposes. Despite their exciting potential, which has been further enhanced by the advent of Generative AI and Large Language Models, bots still face challenges in terms of development and integration into the development cycle, as practitioners report that bots can add difficulties rather than provide improvements. In this work, we aim to provide a taxonomy for characterizing bots, as well as a series of challenges for their adoption in software engineering, accompanied by potential mitigation strategies. To achieve our objectives, we conducted a multivocal literature review , examining both research and practitioner literature. Through such an approach, we hope to contribute to both researchers and practitioners by providing (i) a series of future research directions to pursue, (ii) a list of strategies to adopt for improving the use of bots for software engineering purposes, and (iii) fostering technology and knowledge transfer from the research field to practice—one of the primary goals of multivocal literature reviews.
{"title":"Motivations, Challenges, Best Practices, and Benefits for Bots and Conversational Agents in Software Engineering: A Multivocal Literature Review","authors":"Stefano Lambiase, Gemma Catolino, Fabio Palomba, Filomena Ferrucci","doi":"10.1145/3704806","DOIUrl":"https://doi.org/10.1145/3704806","url":null,"abstract":"<jats:italic> Bots </jats:italic> are software systems designed to support users by automating specific processes, tasks, or activities. When these systems implement a conversational component to interact with users, they are also known as <jats:italic> conversational agents </jats:italic> or <jats:italic>chatbots</jats:italic> . Bots—particularly in their conversation-oriented version and AI-powered—have seen increased adoption over time for software development and engineering purposes. Despite their exciting potential, which has been further enhanced by the advent of Generative AI and Large Language Models, bots still face challenges in terms of development and integration into the development cycle, as practitioners report that bots can add difficulties rather than provide improvements. In this work, we aim to provide a taxonomy for characterizing bots, as well as a series of challenges for their adoption in software engineering, accompanied by potential mitigation strategies. To achieve our objectives, we conducted a <jats:italic>multivocal literature review</jats:italic> , examining both research and practitioner literature. Through such an approach, we hope to contribute to both researchers and practitioners by providing (i) a series of future research directions to pursue, (ii) a list of strategies to adopt for improving the use of bots for software engineering purposes, and (iii) fostering technology and knowledge transfer from the research field to practice—one of the primary goals of multivocal literature reviews.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"23 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142678491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Corinne Allaart, Saba Amiri, Henri Bal, Adam Belloum, Leon Gommans, Aart van Halteren, Sander Klous
Traditionally, deep learning practitioners would bring data into a central repository for model training and inference. Recent developments in distributed learning, such as federated learning and deep learning as a service (DLaaS) do not require centralized data and instead push computing to where the distributed datasets reside. These decentralized training schemes, however, introduce additional security and privacy challenges. This survey first structures the field of distributed learning into two main paradigms and then provides an overview of the recently published protective measures for each. This work highlights both secure training methods as well as private inference measures. Our analyses show that recent publications while being highly dependent on the problem definition, report progress in terms of security, privacy, and efficiency. Nevertheless, we also identify several current issues within the private and secure distributed deep learning (PSDDL) field that require more research. We discuss these issues and provide a general overview of how they might be resolved.
{"title":"Private and Secure Distributed Deep Learning: A Survey","authors":"Corinne Allaart, Saba Amiri, Henri Bal, Adam Belloum, Leon Gommans, Aart van Halteren, Sander Klous","doi":"10.1145/3703452","DOIUrl":"https://doi.org/10.1145/3703452","url":null,"abstract":"Traditionally, deep learning practitioners would bring data into a central repository for model training and inference. Recent developments in distributed learning, such as federated learning and deep learning as a service (DLaaS) do not require centralized data and instead push computing to where the distributed datasets reside. These decentralized training schemes, however, introduce additional security and privacy challenges. This survey first structures the field of distributed learning into two main paradigms and then provides an overview of the recently published protective measures for each. This work highlights both secure training methods as well as private inference measures. Our analyses show that recent publications while being highly dependent on the problem definition, report progress in terms of security, privacy, and efficiency. Nevertheless, we also identify several current issues within the private and secure distributed deep learning (PSDDL) field that require more research. We discuss these issues and provide a general overview of how they might be resolved.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"165 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142642913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shaobo Zhang, Yimeng Pan, Qin Liu, Zheng Yan, Kim-Kwang Raymond Choo, Guojun Wang
Since the emergence of security concerns in artificial intelligence (AI), there has been significant attention devoted to the examination of backdoor attacks. Attackers can utilize backdoor attacks to manipulate model predictions, leading to significant potential harm. However, current research on backdoor attacks and defenses in both theoretical and practical fields still has many shortcomings. To systematically analyze these shortcomings and address the lack of comprehensive reviews, this paper presents a comprehensive and systematic summary of both backdoor attacks and defenses targeting multi-domain AI models. Simultaneously, based on the design principles and shared characteristics of triggers in different domains and the implementation stages of backdoor defense, this paper proposes a new classification method for backdoor attacks and defenses. We use this method to extensively review backdoor attacks in the fields of computer vision and natural language processing, and also examine the current applications of backdoor attacks in audio recognition, video action recognition, multimodal tasks, time series tasks, generative learning, and reinforcement learning, while critically analyzing the open problems of various backdoor attack techniques and defense strategies. Finally, this paper builds upon the analysis of the current state of AI security to further explore potential future research directions for backdoor attacks and defenses.
{"title":"Backdoor Attacks and Defenses Targeting Multi-Domain AI Models: A Comprehensive Review","authors":"Shaobo Zhang, Yimeng Pan, Qin Liu, Zheng Yan, Kim-Kwang Raymond Choo, Guojun Wang","doi":"10.1145/3704725","DOIUrl":"https://doi.org/10.1145/3704725","url":null,"abstract":"Since the emergence of security concerns in artificial intelligence (AI), there has been significant attention devoted to the examination of backdoor attacks. Attackers can utilize backdoor attacks to manipulate model predictions, leading to significant potential harm. However, current research on backdoor attacks and defenses in both theoretical and practical fields still has many shortcomings. To systematically analyze these shortcomings and address the lack of comprehensive reviews, this paper presents a comprehensive and systematic summary of both backdoor attacks and defenses targeting multi-domain AI models. Simultaneously, based on the design principles and shared characteristics of triggers in different domains and the implementation stages of backdoor defense, this paper proposes a new classification method for backdoor attacks and defenses. We use this method to extensively review backdoor attacks in the fields of computer vision and natural language processing, and also examine the current applications of backdoor attacks in audio recognition, video action recognition, multimodal tasks, time series tasks, generative learning, and reinforcement learning, while critically analyzing the open problems of various backdoor attack techniques and defense strategies. Finally, this paper builds upon the analysis of the current state of AI security to further explore potential future research directions for backdoor attacks and defenses.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"5 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142642616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anton Danholt Lautrup, Tobias Hyrup, Arthur Zimek, Peter Schneider-Kamp
Sharing data with third parties is essential for advancing science, but it is becoming more and more difficult with the rise of data protection regulations, ethical restrictions, and growing fear of misuse. Fully synthetic data, which transcends anonymisation, may be the key to unlocking valuable untapped insights stored away in secured data vaults. This review examines current synthetic data generation methods and their utility measurement. We found that more traditional generative models such as Classification and Regression Tree models alongside Bayesian Networks remain highly relevant and are still capable of surpassing deep learning alternatives like Generative Adversarial Networks. However, our findings also display the same lack of agreement on metrics for evaluation, uncovered in earlier reviews, posing a persistent obstacle to advancing the field. We propose a tool for evaluating the utility of synthetic data and illustrate how it can be applied to three synthetic data generation models. By streamlining evaluation and promoting agreement on metrics, researchers can explore novel methods and generate compelling results that will convince data curators and lawmakers to embrace synthetic data. Our review emphasises the potential of synthetic data and highlights the need for greater collaboration and standardisation to unlock its full potential.
{"title":"Systematic Review of Generative Modelling Tools and Utility Metrics for Fully Synthetic Tabular Data","authors":"Anton Danholt Lautrup, Tobias Hyrup, Arthur Zimek, Peter Schneider-Kamp","doi":"10.1145/3704437","DOIUrl":"https://doi.org/10.1145/3704437","url":null,"abstract":"Sharing data with third parties is essential for advancing science, but it is becoming more and more difficult with the rise of data protection regulations, ethical restrictions, and growing fear of misuse. Fully synthetic data, which transcends anonymisation, may be the key to unlocking valuable untapped insights stored away in secured data vaults. This review examines current synthetic data generation methods and their utility measurement. We found that more traditional generative models such as Classification and Regression Tree models alongside Bayesian Networks remain highly relevant and are still capable of surpassing deep learning alternatives like Generative Adversarial Networks. However, our findings also display the same lack of agreement on metrics for evaluation, uncovered in earlier reviews, posing a persistent obstacle to advancing the field. We propose a tool for evaluating the utility of synthetic data and illustrate how it can be applied to three synthetic data generation models. By streamlining evaluation and promoting agreement on metrics, researchers can explore novel methods and generate compelling results that will convince data curators and lawmakers to embrace synthetic data. Our review emphasises the potential of synthetic data and highlights the need for greater collaboration and standardisation to unlock its full potential.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"21 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142637686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emerging cloud-centric networks span from edge clouds to large-scale datacenters with shared infrastructure among multiple tenants and applications with high availability, isolation, fault tolerance, security, and energy efficiency demands. Live migration (LiMi) plays an increasingly critical role in these environments by enabling seamless application mobility covering the edge-to-cloud continuum and maintaining these requirements. This survey presents a comprehensive survey of recent advancements that democratize LiMi, making it more applicable to a broader range of scenarios and network environments both for virtual machines (VMs) and containers, and analyzes LiMi’s technical underpinnings and optimization techniques. It also delves into the issue of connections handover, presenting a taxonomy to categorize methods of traffic redirection synthesized from the existing literature. Finally, it identifies technical challenges and paves the way for future research directions in this key technology.
{"title":"Democratizing Container Live Migration for Enhanced Future Networks - A Survey","authors":"Wissem Soussi, Gürkan Gür, Burkhard Stiller","doi":"10.1145/3704436","DOIUrl":"https://doi.org/10.1145/3704436","url":null,"abstract":"Emerging cloud-centric networks span from edge clouds to large-scale datacenters with shared infrastructure among multiple tenants and applications with high availability, isolation, fault tolerance, security, and energy efficiency demands. Live migration (LiMi) plays an increasingly critical role in these environments by enabling seamless application mobility covering the edge-to-cloud continuum and maintaining these requirements. This survey presents a comprehensive survey of recent advancements that democratize LiMi, making it more applicable to a broader range of scenarios and network environments both for virtual machines (VMs) and containers, and analyzes LiMi’s technical underpinnings and optimization techniques. It also delves into the issue of connections handover, presenting a taxonomy to categorize methods of traffic redirection synthesized from the existing literature. Finally, it identifies technical challenges and paves the way for future research directions in this key technology.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"98 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142637687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Bai, Haibo Hu, Qingqing Ye, Haoyang Li, Leixia Wang, Jianliang Xu
Federated learning is a decentralized machine learning approach where clients train models locally and share model updates to develop a global model. This enables low-resource devices to collaboratively build a high-quality model without requiring direct access to the raw training data. However, despite only sharing model updates, federated learning still faces several privacy vulnerabilities. One of the key threats is membership inference attacks, which target clients’ privacy by determining whether a specific example is part of the training set. These attacks can compromise sensitive information in real-world applications, such as medical diagnoses within a healthcare system. Although there has been extensive research on membership inference attacks, a comprehensive and up-to-date survey specifically focused on it within federated learning is still absent. To fill this gap, we categorize and summarize membership inference attacks and their corresponding defense strategies based on their characteristics in this setting. We introduce a unique taxonomy of existing attack research and provide a systematic overview of various countermeasures. For these studies, we thoroughly analyze the strengths and weaknesses of different approaches. Finally, we identify and discuss key future research directions for readers interested in advancing the field.
{"title":"Membership Inference Attacks and Defenses in Federated Learning: A Survey","authors":"Li Bai, Haibo Hu, Qingqing Ye, Haoyang Li, Leixia Wang, Jianliang Xu","doi":"10.1145/3704633","DOIUrl":"https://doi.org/10.1145/3704633","url":null,"abstract":"Federated learning is a decentralized machine learning approach where clients train models locally and share model updates to develop a global model. This enables low-resource devices to collaboratively build a high-quality model without requiring direct access to the raw training data. However, despite only sharing model updates, federated learning still faces several privacy vulnerabilities. One of the key threats is membership inference attacks, which target clients’ privacy by determining whether a specific example is part of the training set. These attacks can compromise sensitive information in real-world applications, such as medical diagnoses within a healthcare system. Although there has been extensive research on membership inference attacks, a comprehensive and up-to-date survey specifically focused on it within federated learning is still absent. To fill this gap, we categorize and summarize membership inference attacks and their corresponding defense strategies based on their characteristics in this setting. We introduce a unique taxonomy of existing attack research and provide a systematic overview of various countermeasures. For these studies, we thoroughly analyze the strengths and weaknesses of different approaches. Finally, we identify and discuss key future research directions for readers interested in advancing the field.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"37 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142637688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep reinforcement learning has led to dramatic breakthroughs in the field of artificial intelligence for the past few years. As the amount of rollout experience data and the size of neural networks for deep reinforcement learning have grown continuously, handling the training process and reducing the time consumption using parallel and distributed computing is becoming an urgent and essential desire. In this paper, we perform a broad and thorough investigation on training acceleration methodologies for deep reinforcement learning based on parallel and distributed computing, providing a comprehensive survey in this field with state-of-the-art methods and pointers to core references. In particular, a taxonomy of literature is provided, along with a discussion of emerging topics and open issues. This incorporates learning system architectures, simulation parallelism, computing parallelism, distributed synchronization mechanisms, and deep evolutionary reinforcement learning. Further, we compare 16 current open-source libraries and platforms with criteria of facilitating rapid development. Finally, we extrapolate future directions that deserve further research.
{"title":"Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey","authors":"Zhihong Liu, Xin Xu, Peng Qiao, DongSheng Li","doi":"10.1145/3703453","DOIUrl":"https://doi.org/10.1145/3703453","url":null,"abstract":"Deep reinforcement learning has led to dramatic breakthroughs in the field of artificial intelligence for the past few years. As the amount of rollout experience data and the size of neural networks for deep reinforcement learning have grown continuously, handling the training process and reducing the time consumption using parallel and distributed computing is becoming an urgent and essential desire. In this paper, we perform a broad and thorough investigation on training acceleration methodologies for deep reinforcement learning based on parallel and distributed computing, providing a comprehensive survey in this field with state-of-the-art methods and pointers to core references. In particular, a taxonomy of literature is provided, along with a discussion of emerging topics and open issues. This incorporates learning system architectures, simulation parallelism, computing parallelism, distributed synchronization mechanisms, and deep evolutionary reinforcement learning. Further, we compare 16 current open-source libraries and platforms with criteria of facilitating rapid development. Finally, we extrapolate future directions that deserve further research.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"197 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142637880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaojie Wang, Zhonghui Zhao, Ling Yi, Zhaolong Ning, Lei Guo, F. Richard Yu, Song Guo
The increasing popularity of Unmanned Aerial Vehicle (UAV) swarms is attributed to their ability to generate substantial returns for various industries at a low cost. Additionally, in the future landscape of wireless networks, UAV swarms can serve as airborne base stations, alleviating the scarcity of communication resources. However, UAV swarm networks are vulnerable to various security threats that attackers can exploit with unpredictable consequences. Against this background, this paper provides a comprehensive review on security of UAV swarm networks. We begin by briefly introducing the dominant UAV swarm technologies, followed by their civilian and military applications. We then present and categorize various potential attacks that UAV swarm networks may encounter, such as denial-of-service attacks, man-in-the-middle attacks and attacks against Machine Learning (ML) models. After that, we introduce security technologies that can be utilized to address these attacks, including cryptography, physical layer security techniques, blockchain, ML, and intrusion detection. Additionally, we investigate and summarize mitigation strategies addressing different security threats in UAV swarm networks. Finally, some research directions and challenges are discussed.
{"title":"A Survey on Security of UAV Swarm Networks: Attacks and Countermeasures","authors":"Xiaojie Wang, Zhonghui Zhao, Ling Yi, Zhaolong Ning, Lei Guo, F. Richard Yu, Song Guo","doi":"10.1145/3703625","DOIUrl":"https://doi.org/10.1145/3703625","url":null,"abstract":"The increasing popularity of Unmanned Aerial Vehicle (UAV) swarms is attributed to their ability to generate substantial returns for various industries at a low cost. Additionally, in the future landscape of wireless networks, UAV swarms can serve as airborne base stations, alleviating the scarcity of communication resources. However, UAV swarm networks are vulnerable to various security threats that attackers can exploit with unpredictable consequences. Against this background, this paper provides a comprehensive review on security of UAV swarm networks. We begin by briefly introducing the dominant UAV swarm technologies, followed by their civilian and military applications. We then present and categorize various potential attacks that UAV swarm networks may encounter, such as denial-of-service attacks, man-in-the-middle attacks and attacks against Machine Learning (ML) models. After that, we introduce security technologies that can be utilized to address these attacks, including cryptography, physical layer security techniques, blockchain, ML, and intrusion detection. Additionally, we investigate and summarize mitigation strategies addressing different security threats in UAV swarm networks. Finally, some research directions and challenges are discussed.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"150 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142597420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The advent of artificial intelligence-generated content (AIGC) represents a pivotal moment in the evolution of information technology. With AIGC, it can be effortless to generate high-quality data that is challenging for the public to distinguish. Nevertheless, the proliferation of generative data across cyberspace brings security and privacy issues, including privacy leakages of individuals and media forgery for fraudulent purposes. Consequently, both academia and industry begin to emphasize the trustworthiness of generative data, successively providing a series of countermeasures for security and privacy. In this survey, we systematically review the security and privacy on generative data in AIGC, particularly for the first time analyzing them from the perspective of information security properties. Specifically, we reveal the successful experiences of state-of-the-art countermeasures in terms of the foundational properties of privacy, controllability, authenticity, and compliance, respectively. Finally, we show some representative benchmarks, present a statistical analysis, and summarize the potential exploration directions from each of theses properties.
{"title":"Security and Privacy on Generative Data in AIGC: A Survey","authors":"Tao Wang, Yushu Zhang, Shuren Qi, Ruoyu Zhao, Xia Zhihua, Jian Weng","doi":"10.1145/3703626","DOIUrl":"https://doi.org/10.1145/3703626","url":null,"abstract":"The advent of artificial intelligence-generated content (AIGC) represents a pivotal moment in the evolution of information technology. With AIGC, it can be effortless to generate high-quality data that is challenging for the public to distinguish. Nevertheless, the proliferation of generative data across cyberspace brings security and privacy issues, including privacy leakages of individuals and media forgery for fraudulent purposes. Consequently, both academia and industry begin to emphasize the trustworthiness of generative data, successively providing a series of countermeasures for security and privacy. In this survey, we systematically review the security and privacy on generative data in AIGC, particularly for the first time analyzing them from the perspective of information security properties. Specifically, we reveal the successful experiences of state-of-the-art countermeasures in terms of the foundational properties of privacy, controllability, authenticity, and compliance, respectively. Finally, we show some representative benchmarks, present a statistical analysis, and summarize the potential exploration directions from each of theses properties.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"37 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}