We are not able to deal with a mammoth text corpus without summarizing them into a relatively small subset. A computational tool is extremely needed to understand such a gigantic pool of text. Probabilistic Topic Modeling discovers and explains the enormous collection of documents by reducing them in a topical subspace. In this work, we study the background and advancement of topic modeling techniques. We first introduce the preliminaries of the topic modeling techniques and review its extensions and variations, such as topic modeling over various domains, hierarchical topic modeling, word embedded topic models, and topic models in multilingual perspectives. Besides, the research work for topic modeling in a distributed environment, topic visualization approaches also have been explored. We also covered the implementation and evaluation techniques for topic models in brief. Comparison matrices have been shown over the experimental results of the various categories of topic modeling. Diverse technical challenges and future directions have been discussed.
{"title":"Topic Modeling Using Latent Dirichlet allocation","authors":"Uttam Chauhan, Apurva Shah","doi":"10.1145/3462478","DOIUrl":"https://doi.org/10.1145/3462478","url":null,"abstract":"We are not able to deal with a mammoth text corpus without summarizing them into a relatively small subset. A computational tool is extremely needed to understand such a gigantic pool of text. Probabilistic Topic Modeling discovers and explains the enormous collection of documents by reducing them in a topical subspace. In this work, we study the background and advancement of topic modeling techniques. We first introduce the preliminaries of the topic modeling techniques and review its extensions and variations, such as topic modeling over various domains, hierarchical topic modeling, word embedded topic models, and topic models in multilingual perspectives. Besides, the research work for topic modeling in a distributed environment, topic visualization approaches also have been explored. We also covered the implementation and evaluation techniques for topic models in brief. Comparison matrices have been shown over the experimental results of the various categories of topic modeling. Diverse technical challenges and future directions have been discussed.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"15 1","pages":"1 - 35"},"PeriodicalIF":0.0,"publicationDate":"2021-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91072801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The explosive growth and widespread accessibility of medical information on the Internet have led to a surge of research activity in a wide range of scientific communities including health informatics and information retrieval (IR). One of the common concerns of this research, across these disciplines, is how to design either clinical decision support systems or medical search engines capable of providing adequate support for both novices (e.g., patients and their next-of-kin) and experts (e.g., physicians, clinicians) tackling complex tasks (e.g., search for diagnosis, search for a treatment). However, despite the significant multi-disciplinary research advances, current medical search systems exhibit low levels of performance. This survey provides an overview of the state of the art in the disciplines of IR and health informatics, and bridging these disciplines shows how semantic search techniques can facilitate medical IR. First,we will give a broad picture of semantic search and medical IR and then highlight the major scientific challenges. Second, focusing on the semantic gap challenge, we will discuss representative state-of-the-art work related to feature-based as well as semantic-based representation and matching models that support medical search systems. In addition to seminal works, we will present recent works that rely on research advancements in deep learning. Third, we make a thorough cross-model analysis and provide some findings and lessons learned. Finally, we discuss some open issues and possible promising directions for future research trends.
{"title":"Semantic Information Retrieval on Medical Texts","authors":"L. Tamine, L. Goeuriot","doi":"10.1145/3462476","DOIUrl":"https://doi.org/10.1145/3462476","url":null,"abstract":"The explosive growth and widespread accessibility of medical information on the Internet have led to a surge of research activity in a wide range of scientific communities including health informatics and information retrieval (IR). One of the common concerns of this research, across these disciplines, is how to design either clinical decision support systems or medical search engines capable of providing adequate support for both novices (e.g., patients and their next-of-kin) and experts (e.g., physicians, clinicians) tackling complex tasks (e.g., search for diagnosis, search for a treatment). However, despite the significant multi-disciplinary research advances, current medical search systems exhibit low levels of performance. This survey provides an overview of the state of the art in the disciplines of IR and health informatics, and bridging these disciplines shows how semantic search techniques can facilitate medical IR. First,we will give a broad picture of semantic search and medical IR and then highlight the major scientific challenges. Second, focusing on the semantic gap challenge, we will discuss representative state-of-the-art work related to feature-based as well as semantic-based representation and matching models that support medical search systems. In addition to seminal works, we will present recent works that rely on research advancements in deep learning. Third, we make a thorough cross-model analysis and provide some findings and lessons learned. Finally, we discuss some open issues and possible promising directions for future research trends.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"50 1","pages":"1 - 38"},"PeriodicalIF":0.0,"publicationDate":"2021-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88493764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sophie Dramé-Maigné, M. Laurent-Maknavicius, Laurent Castillo, H. Ganem
The Internet of Things is taking hold in our everyday life. Regrettably, the security of IoT devices is often being overlooked. Among the vast array of security issues plaguing the emerging IoT, we decide to focus on access control, as privacy, trust, and other security properties cannot be achieved without controlled access. This article classifies IoT access control solutions from the literature according to their architecture (e.g., centralized, hierarchical, federated, distributed) and examines the suitability of each one for access control purposes. Our analysis concludes that important properties such as auditability and revocation are missing from many proposals while hierarchical and federated architectures are neglected by the community. Finally, we provide an architecture-based taxonomy and future research directions: a focus on hybrid architectures, usability, flexibility, privacy, and revocation schemes in serverless authorization.
{"title":"Centralized, Distributed, and Everything in between","authors":"Sophie Dramé-Maigné, M. Laurent-Maknavicius, Laurent Castillo, H. Ganem","doi":"10.1145/3465170","DOIUrl":"https://doi.org/10.1145/3465170","url":null,"abstract":"The Internet of Things is taking hold in our everyday life. Regrettably, the security of IoT devices is often being overlooked. Among the vast array of security issues plaguing the emerging IoT, we decide to focus on access control, as privacy, trust, and other security properties cannot be achieved without controlled access. This article classifies IoT access control solutions from the literature according to their architecture (e.g., centralized, hierarchical, federated, distributed) and examines the suitability of each one for access control purposes. Our analysis concludes that important properties such as auditability and revocation are missing from many proposals while hierarchical and federated architectures are neglected by the community. Finally, we provide an architecture-based taxonomy and future research directions: a focus on hybrid architectures, usability, flexibility, privacy, and revocation schemes in serverless authorization.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"13 1","pages":"1 - 34"},"PeriodicalIF":0.0,"publicationDate":"2021-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77913050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yohan Bonescki Gumiel, Lucas Emanuel Silva e Oliveira, V. Claveau, N. Grabar, E. Paraiso, C. Moro, D. Carvalho
Unstructured data in electronic health records, represented by clinical texts, are a vast source of healthcare information because they describe a patient's journey, including clinical findings, procedures, and information about the continuity of care. The publication of several studies on temporal relation extraction from clinical texts during the last decade and the realization of multiple shared tasks highlight the importance of this research theme. Therefore, we propose a review of temporal relation extraction in clinical texts. We analyzed 105 articles and verified that relations between events and document creation time, a coarse temporality type, were addressed with traditional machine learning–based models with few recent initiatives to push the state-of-the-art with deep learning–based models. For temporal relations between entities (event and temporal expressions) in the document, factors such as dataset imbalance because of candidate pair generation and task complexity directly affect the system's performance. The state-of-the-art resides on attention-based models, with contextualized word representations being fine-tuned for temporal relation extraction. However, further experiments and advances in the research topic are required until real-time clinical domain applications are released. Furthermore, most of the publications mainly reside on the same dataset, hindering the need for new annotation projects that provide datasets for different medical specialties, clinical text types, and even languages.
{"title":"Temporal Relation Extraction in Clinical Texts","authors":"Yohan Bonescki Gumiel, Lucas Emanuel Silva e Oliveira, V. Claveau, N. Grabar, E. Paraiso, C. Moro, D. Carvalho","doi":"10.1145/3462475","DOIUrl":"https://doi.org/10.1145/3462475","url":null,"abstract":"Unstructured data in electronic health records, represented by clinical texts, are a vast source of healthcare information because they describe a patient's journey, including clinical findings, procedures, and information about the continuity of care. The publication of several studies on temporal relation extraction from clinical texts during the last decade and the realization of multiple shared tasks highlight the importance of this research theme. Therefore, we propose a review of temporal relation extraction in clinical texts. We analyzed 105 articles and verified that relations between events and document creation time, a coarse temporality type, were addressed with traditional machine learning–based models with few recent initiatives to push the state-of-the-art with deep learning–based models. For temporal relations between entities (event and temporal expressions) in the document, factors such as dataset imbalance because of candidate pair generation and task complexity directly affect the system's performance. The state-of-the-art resides on attention-based models, with contextualized word representations being fine-tuned for temporal relation extraction. However, further experiments and advances in the research topic are required until real-time clinical domain applications are released. Furthermore, most of the publications mainly reside on the same dataset, hindering the need for new annotation projects that provide datasets for different medical specialties, clinical text types, and even languages.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"92 1","pages":"1 - 36"},"PeriodicalIF":0.0,"publicationDate":"2021-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80472007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Berger, Philipp Eichhammer, Hans P. Reiser, Jörg Domaschka, F. Hauck, Gerhard Habiger
Internet-of-Things (IoT) ecosystems tend to grow both in scale and complexity, as they consist of a variety of heterogeneous devices that span over multiple architectural IoT layers (e.g., cloud, edge, sensors). Further, IoT systems increasingly demand the resilient operability of services, as they become part of critical infrastructures. This leads to a broad variety of research works that aim to increase the resilience of these systems. In this article, we create a systematization of knowledge about existing scientific efforts of making IoT systems resilient. In particular, we first discuss the taxonomy and classification of resilience and resilience mechanisms and subsequently survey state-of-the-art resilience mechanisms that have been proposed by research work and are applicable to IoT. As part of the survey, we also discuss questions that focus on the practical aspects of resilience, e.g., which constraints resilience mechanisms impose on developers when designing resilient systems by incorporating a specific mechanism into IoT systems.
{"title":"A Survey on Resilience in the IoT","authors":"C. Berger, Philipp Eichhammer, Hans P. Reiser, Jörg Domaschka, F. Hauck, Gerhard Habiger","doi":"10.1145/3462513","DOIUrl":"https://doi.org/10.1145/3462513","url":null,"abstract":"Internet-of-Things (IoT) ecosystems tend to grow both in scale and complexity, as they consist of a variety of heterogeneous devices that span over multiple architectural IoT layers (e.g., cloud, edge, sensors). Further, IoT systems increasingly demand the resilient operability of services, as they become part of critical infrastructures. This leads to a broad variety of research works that aim to increase the resilience of these systems. In this article, we create a systematization of knowledge about existing scientific efforts of making IoT systems resilient. In particular, we first discuss the taxonomy and classification of resilience and resilience mechanisms and subsequently survey state-of-the-art resilience mechanisms that have been proposed by research work and are applicable to IoT. As part of the survey, we also discuss questions that focus on the practical aspects of resilience, e.g., which constraints resilience mechanisms impose on developers when designing resilient systems by incorporating a specific mechanism into IoT systems.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"33 1","pages":"1 - 39"},"PeriodicalIF":0.0,"publicationDate":"2021-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89614420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luciano Ignaczak, Guilherme Goldschmidt, C. Costa, R. Righi
The growth of data volume has changed cybersecurity activities, demanding a higher level of automation. In this new cybersecurity landscape, text mining emerged as an alternative to improve the efficiency of the activities involving unstructured data. This article proposes a Systematic Literature Review (SLR) to present the application of text mining in the cybersecurity domain. Using a systematic protocol, we identified 2,196 studies, out of which 83 were summarized. As a contribution, we propose a taxonomy to demonstrate the different activities in the cybersecurity domain supported by text mining. We also detail the strategies evaluated in the application of text mining tasks and the use of neural networks to support activities involving unstructured data. The work also discusses text classification performance aiming its application in real-world solutions. The SLR also highlights open gaps for future research, such as the analysis of non-English content and the intensification in the usage of neural networks.
{"title":"Text Mining in Cybersecurity","authors":"Luciano Ignaczak, Guilherme Goldschmidt, C. Costa, R. Righi","doi":"10.1145/3462477","DOIUrl":"https://doi.org/10.1145/3462477","url":null,"abstract":"The growth of data volume has changed cybersecurity activities, demanding a higher level of automation. In this new cybersecurity landscape, text mining emerged as an alternative to improve the efficiency of the activities involving unstructured data. This article proposes a Systematic Literature Review (SLR) to present the application of text mining in the cybersecurity domain. Using a systematic protocol, we identified 2,196 studies, out of which 83 were summarized. As a contribution, we propose a taxonomy to demonstrate the different activities in the cybersecurity domain supported by text mining. We also detail the strategies evaluated in the application of text mining tasks and the use of neural networks to support activities involving unstructured data. The work also discusses text classification performance aiming its application in real-world solutions. The SLR also highlights open gaps for future research, such as the analysis of non-English content and the intensification in the usage of neural networks.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"13 1","pages":"1 - 36"},"PeriodicalIF":0.0,"publicationDate":"2021-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85214875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lefeng Zhang, Tianqing Zhu, P. Xiong, Wanlei Zhou, Philip S. Yu
The vast majority of artificial intelligence solutions are founded on game theory, and differential privacy is emerging as perhaps the most rigorous and widely adopted privacy paradigm in the field. However, alongside all the advancements made in both these fields, there is not a single application that is not still vulnerable to privacy violations, security breaches, or manipulation by adversaries. Our understanding of the interactions between differential privacy and game theoretic solutions is limited. Hence, we undertook a comprehensive review of literature in the field, finding that differential privacy has several advantageous properties that can make more of a contribution to game theory than just privacy protection. It can also be used to build heuristic models for game-theoretic solutions, to avert strategic manipulations, and to quantify the cost of privacy protection. With a focus on mechanism design, the aim of this article is to provide a new perspective on the currently held impossibilities in game theory, potential avenues to circumvent those impossibilities, and opportunities to improve the performance of game-theoretic solutions with differentially private techniques.
{"title":"More than Privacy","authors":"Lefeng Zhang, Tianqing Zhu, P. Xiong, Wanlei Zhou, Philip S. Yu","doi":"10.1145/3460771","DOIUrl":"https://doi.org/10.1145/3460771","url":null,"abstract":"The vast majority of artificial intelligence solutions are founded on game theory, and differential privacy is emerging as perhaps the most rigorous and widely adopted privacy paradigm in the field. However, alongside all the advancements made in both these fields, there is not a single application that is not still vulnerable to privacy violations, security breaches, or manipulation by adversaries. Our understanding of the interactions between differential privacy and game theoretic solutions is limited. Hence, we undertook a comprehensive review of literature in the field, finding that differential privacy has several advantageous properties that can make more of a contribution to game theory than just privacy protection. It can also be used to build heuristic models for game-theoretic solutions, to avert strategic manipulations, and to quantify the cost of privacy protection. With a focus on mechanism design, the aim of this article is to provide a new perspective on the currently held impossibilities in game theory, potential avenues to circumvent those impossibilities, and opportunities to improve the performance of game-theoretic solutions with differentially private techniques.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"15 1","pages":"1 - 37"},"PeriodicalIF":0.0,"publicationDate":"2021-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81862035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Oikonomou, Anna Karanika, C. Anagnostopoulos, Kostas Kolomvatsos
Nowadays, we are witnessing the advent of the Internet of Things (IoT) with numerous devices performing interactions between them or with their environment. The huge number of devices leads to huge volumes of data that demand the appropriate processing. The “legacy” approach is to rely on Cloud where increased computational resources can realize any desired processing. However, the need for supporting real-time applications requires a reduced latency in the provision of outcomes. Edge Computing (EC) comes as the “solver” of the latency problem. Various processing activities can be performed at EC nodes having direct connection with IoT devices. A number of challenges should be met before we conclude a fully automated ecosystem where nodes can cooperate or understand their status to efficiently serve applications. In this article, we perform a survey of the relevant research activities towards the vision of Edge Mesh (EM), i.e., a “cover” of intelligence upon the EC. We present the necessary hardware and discuss research outcomes in every aspect of EC/EM nodes functioning. We present technologies and theories adopted for data, tasks, and resource management while discussing how machine learning and optimization can be adopted in the domain.
{"title":"On the Use of Intelligent Models towards Meeting the Challenges of the Edge Mesh","authors":"P. Oikonomou, Anna Karanika, C. Anagnostopoulos, Kostas Kolomvatsos","doi":"10.1145/3456630","DOIUrl":"https://doi.org/10.1145/3456630","url":null,"abstract":"Nowadays, we are witnessing the advent of the Internet of Things (IoT) with numerous devices performing interactions between them or with their environment. The huge number of devices leads to huge volumes of data that demand the appropriate processing. The “legacy” approach is to rely on Cloud where increased computational resources can realize any desired processing. However, the need for supporting real-time applications requires a reduced latency in the provision of outcomes. Edge Computing (EC) comes as the “solver” of the latency problem. Various processing activities can be performed at EC nodes having direct connection with IoT devices. A number of challenges should be met before we conclude a fully automated ecosystem where nodes can cooperate or understand their status to efficiently serve applications. In this article, we perform a survey of the relevant research activities towards the vision of Edge Mesh (EM), i.e., a “cover” of intelligence upon the EC. We present the necessary hardware and discuss research outcomes in every aspect of EC/EM nodes functioning. We present technologies and theories adopted for data, tasks, and resource management while discussing how machine learning and optimization can be adopted in the domain.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"42 1","pages":"1 - 42"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82482117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The adoption of network traffic encryption is continually growing. Popular applications use encryption protocols to secure communications and protect the privacy of users. In addition, a large portion of malware is spread through the network traffic taking advantage of encryption protocols to hide its presence and activity. Entering into the era of completely encrypted communications over the Internet, we must rapidly start reviewing the state-of-the-art in the wide domain of network traffic analysis and inspection, to conclude if traditional traffic processing systems will be able to seamlessly adapt to the upcoming full adoption of network encryption. In this survey, we examine the literature that deals with network traffic analysis and inspection after the ascent of encryption in communication channels. We notice that the research community has already started proposing solutions on how to perform inspection even when the network traffic is encrypted and we demonstrate and review these works. In addition, we present the techniques and methods that these works use and their limitations. Finally, we examine the countermeasures that have been proposed in the literature in order to circumvent traffic analysis techniques that aim to harm user privacy.
{"title":"A Survey on Encrypted Network Traffic Analysis Applications, Techniques, and Countermeasures","authors":"Eva Papadogiannaki, S. Ioannidis","doi":"10.1145/3457904","DOIUrl":"https://doi.org/10.1145/3457904","url":null,"abstract":"The adoption of network traffic encryption is continually growing. Popular applications use encryption protocols to secure communications and protect the privacy of users. In addition, a large portion of malware is spread through the network traffic taking advantage of encryption protocols to hide its presence and activity. Entering into the era of completely encrypted communications over the Internet, we must rapidly start reviewing the state-of-the-art in the wide domain of network traffic analysis and inspection, to conclude if traditional traffic processing systems will be able to seamlessly adapt to the upcoming full adoption of network encryption. In this survey, we examine the literature that deals with network traffic analysis and inspection after the ascent of encryption in communication channels. We notice that the research community has already started proposing solutions on how to perform inspection even when the network traffic is encrypted and we demonstrate and review these works. In addition, we present the techniques and methods that these works use and their limitations. Finally, we examine the countermeasures that have been proposed in the literature in order to circumvent traffic analysis techniques that aim to harm user privacy.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"33 1","pages":"1 - 35"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88841368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The past four years have witnessed the rapid development of federated learning (FL). However, new privacy concerns have also emerged during the aggregation of the distributed intermediate results. The emerging privacy-preserving FL (PPFL) has been heralded as a solution to generic privacy-preserving machine learning. However, the challenge of protecting data privacy while maintaining the data utility through machine learning still remains. In this article, we present a comprehensive and systematic survey on the PPFL based on our proposed 5W-scenario-based taxonomy. We analyze the privacy leakage risks in the FL from five aspects, summarize existing methods, and identify future research directions.
{"title":"A Comprehensive Survey of Privacy-preserving Federated Learning","authors":"Xuefei Yin, Yanming Zhu, Jiankun Hu","doi":"10.1145/3460427","DOIUrl":"https://doi.org/10.1145/3460427","url":null,"abstract":"The past four years have witnessed the rapid development of federated learning (FL). However, new privacy concerns have also emerged during the aggregation of the distributed intermediate results. The emerging privacy-preserving FL (PPFL) has been heralded as a solution to generic privacy-preserving machine learning. However, the challenge of protecting data privacy while maintaining the data utility through machine learning still remains. In this article, we present a comprehensive and systematic survey on the PPFL based on our proposed 5W-scenario-based taxonomy. We analyze the privacy leakage risks in the FL from five aspects, summarize existing methods, and identify future research directions.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"24 1","pages":"1 - 36"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74223266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}