首页 > 最新文献

ACM Computing Surveys最新文献

英文 中文
Advancements in Federated Learning: Models, Methods, and Privacy 联合学习的进步:模式、方法和隐私
IF 16.6 1区 计算机科学 Q1 Mathematics Pub Date : 2024-06-01 DOI: 10.1145/3664650
Huiming Chen, Huandong Wang, Qingyue Long, Depeng Jin, Yong Li

Federated learning (FL) is a promising technique for resolving the rising privacy and security concerns. Its main ingredient is to cooperatively learn the model among the distributed clients without uploading any sensitive data. In this paper, we conducted a thorough review of the related works, following the development context and deeply mining the key technologies behind FL from the perspectives of theory and application. Specifically, we first classify the existing works in FL architecture based on the network topology of FL systems with detailed analysis and summarization. Next, we abstract the current application problems, summarize the general techniques and frame the application problems into the general paradigm of FL base models. Moreover, we provide our proposed solutions for model training via FL. We have summarized and analyzed the existing FedOpt algorithms, and deeply revealed the algorithmic development principles of many first-order algorithms in depth, proposing a more generalized algorithm design framework. With the instantiation of these frameworks, FedOpt algorithms can be simply developed. As privacy and security is the fundamental requirement in FL, we provide the existing attack scenarios and the defense methods. To the best of our knowledge, we are among the first tier to review the theoretical methodology and propose our strategies since there are very few works surveying the theoretical approaches. Our survey targets motivating the development of high-performance, privacy-preserving, and secure methods to integrate FL into real-world applications.

联合学习(FL)是解决日益增长的隐私和安全问题的一种有前途的技术。其主要内容是在不上传任何敏感数据的情况下,在分布式客户端之间合作学习模型。在本文中,我们对相关工作进行了全面回顾,遵循发展脉络,从理论和应用的角度深入挖掘了 FL 背后的关键技术。具体来说,我们首先根据 FL 系统的网络拓扑结构对 FL 架构的现有工作进行了分类,并进行了详细的分析和总结。接着,我们抽象出当前的应用问题,总结出通用技术,并将应用问题框定到 FL 基础模型的一般范式中。此外,我们还提出了通过 FL 进行模型训练的解决方案。我们总结分析了现有的 FedOpt 算法,深入揭示了许多一阶算法的算法开发原理,提出了更具普适性的算法设计框架。通过这些框架的实例化,可以简单地开发出 FedOpt 算法。由于隐私和安全是 FL 的基本要求,我们提供了现有的攻击场景和防御方法。据我们所知,我们是第一批回顾理论方法并提出我们的策略的人,因为很少有著作调查理论方法。我们的调查旨在激励开发高性能、保护隐私和安全的方法,以便将 FL 集成到现实世界的应用中。
{"title":"Advancements in Federated Learning: Models, Methods, and Privacy","authors":"Huiming Chen, Huandong Wang, Qingyue Long, Depeng Jin, Yong Li","doi":"10.1145/3664650","DOIUrl":"https://doi.org/10.1145/3664650","url":null,"abstract":"<p>Federated learning (FL) is a promising technique for resolving the rising privacy and security concerns. Its main ingredient is to cooperatively learn the model among the distributed clients without uploading any sensitive data. In this paper, we conducted a thorough review of the related works, following the development context and deeply mining the key technologies behind FL from the perspectives of theory and application. Specifically, we first classify the existing works in FL architecture based on the network topology of FL systems with detailed analysis and summarization. Next, we abstract the current application problems, summarize the general techniques and frame the application problems into the general paradigm of FL base models. Moreover, we provide our proposed solutions for model training via FL. We have summarized and analyzed the existing FedOpt algorithms, and deeply revealed the algorithmic development principles of many first-order algorithms in depth, proposing a more generalized algorithm design framework. With the instantiation of these frameworks, FedOpt algorithms can be simply developed. As privacy and security is the fundamental requirement in FL, we provide the existing attack scenarios and the defense methods. To the best of our knowledge, we are among the first tier to review the theoretical methodology and propose our strategies since there are very few works surveying the theoretical approaches. Our survey targets motivating the development of high-performance, privacy-preserving, and secure methods to integrate FL into real-world applications.</p>","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":null,"pages":null},"PeriodicalIF":16.6,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141251632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research Progress of EEG-Based Emotion Recognition: A Survey 基于脑电图的情绪识别研究进展:调查
IF 16.6 1区 计算机科学 Q1 Mathematics Pub Date : 2024-05-28 DOI: 10.1145/3666002
Yiming Wang, Bin Zhang, Lamei Di

Emotion recognition based on electroencephalography (EEG) signals has emerged as a prominent research field, facilitating objective evaluation of diseases like depression and motion detection for heathy people. Starting from the basic concepts of temporal-frequency-spatial features in EEG and the methods for cross-domain feature fusion. This survey then extends the overfitting challenge of EEG single-modal to the problem of heterogeneous modality modeling in multi-modal conditions. It explores issues such as feature selection, sample scarcity, cross-subject emotional transfer, physiological knowledge discovery, multi-modal fusion methods and modality missing. These findings provide clues for researchers to further investigate emotion recognition based on EEG signals.

基于脑电图(EEG)信号的情绪识别已成为一个突出的研究领域,有助于对抑郁症等疾病进行客观评估和对健康人进行运动检测。本研究从脑电信号的时间-频率-空间特征的基本概念和跨域特征融合方法入手。然后,本研究将脑电图单模态过拟合挑战扩展到多模态条件下的异构模态建模问题。它探讨了特征选择、样本稀缺性、跨主体情感转移、生理知识发现、多模态融合方法和模态缺失等问题。这些发现为研究人员进一步研究基于脑电信号的情感识别提供了线索。
{"title":"Research Progress of EEG-Based Emotion Recognition: A Survey","authors":"Yiming Wang, Bin Zhang, Lamei Di","doi":"10.1145/3666002","DOIUrl":"https://doi.org/10.1145/3666002","url":null,"abstract":"<p>Emotion recognition based on electroencephalography (EEG) signals has emerged as a prominent research field, facilitating objective evaluation of diseases like depression and motion detection for heathy people. Starting from the basic concepts of temporal-frequency-spatial features in EEG and the methods for cross-domain feature fusion. This survey then extends the overfitting challenge of EEG single-modal to the problem of heterogeneous modality modeling in multi-modal conditions. It explores issues such as feature selection, sample scarcity, cross-subject emotional transfer, physiological knowledge discovery, multi-modal fusion methods and modality missing. These findings provide clues for researchers to further investigate emotion recognition based on EEG signals.</p>","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":null,"pages":null},"PeriodicalIF":16.6,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141251551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A.I. Robustness: a Human-Centered Perspective on Technological Challenges and Opportunities 人工智能的鲁棒性:从以人为本的角度看技术挑战与机遇
IF 16.6 1区 计算机科学 Q1 Mathematics Pub Date : 2024-05-27 DOI: 10.1145/3665926
Andrea Tocchetti, Lorenzo Corti, Agathe Balayn, Mireia Yurrita, Philip Lippmann, Marco Brambilla, Jie Yang

Despite the impressive performance of Artificial Intelligence (AI) systems, their robustness remains elusive and constitutes a key issue that impedes large-scale adoption. Besides, robustness is interpreted differently across domains and contexts of AI. In this work, we systematically survey recent progress to provide a reconciled terminology of concepts around AI robustness. We introduce three taxonomies to organize and describe the literature both from a fundamental and applied point of view: 1) methods and approaches that address robustness in different phases of the machine learning pipeline; 2) methods improving robustness in specific model architectures, tasks, and systems; and in addition, 3) methodologies and insights around evaluating the robustness of AI systems, particularly the trade-offs with other trustworthiness properties. Finally, we identify and discuss research gaps and opportunities and give an outlook on the field. We highlight the central role of humans in evaluating and enhancing AI robustness, considering the necessary knowledge they can provide, and discuss the need for better understanding practices and developing supportive tools in the future.

尽管人工智能(AI)系统的性能令人印象深刻,但其鲁棒性仍然难以捉摸,成为阻碍大规模应用的关键问题。此外,在不同的人工智能领域和环境中,对鲁棒性的解释也不尽相同。在这项工作中,我们系统地调查了最近的进展,为人工智能的鲁棒性提供了一个协调的概念术语。我们引入了三个分类法,从基础和应用的角度来组织和描述文献:1)在机器学习管道的不同阶段解决鲁棒性问题的方法和途径;2)在特定模型架构、任务和系统中提高鲁棒性的方法;此外,3)评估人工智能系统鲁棒性的方法和见解,特别是与其他可信性属性之间的权衡。最后,我们确定并讨论了研究差距和机遇,并对该领域进行了展望。我们强调了人类在评估和增强人工智能鲁棒性方面的核心作用,考虑了人类可以提供的必要知识,并讨论了未来更好地理解实践和开发辅助工具的必要性。
{"title":"A.I. Robustness: a Human-Centered Perspective on Technological Challenges and Opportunities","authors":"Andrea Tocchetti, Lorenzo Corti, Agathe Balayn, Mireia Yurrita, Philip Lippmann, Marco Brambilla, Jie Yang","doi":"10.1145/3665926","DOIUrl":"https://doi.org/10.1145/3665926","url":null,"abstract":"<p>Despite the impressive performance of Artificial Intelligence (AI) systems, their robustness remains elusive and constitutes a key issue that impedes large-scale adoption. Besides, robustness is interpreted differently across domains and contexts of AI. In this work, we systematically survey recent progress to provide a reconciled terminology of concepts around AI robustness. We introduce three taxonomies to organize and describe the literature both from a fundamental and applied point of view: 1) methods and approaches that address robustness in different phases of the machine learning pipeline; 2) methods improving robustness in specific model architectures, tasks, and systems; and in addition, 3) methodologies and insights around evaluating the robustness of AI systems, particularly the trade-offs with other trustworthiness properties. Finally, we identify and discuss research gaps and opportunities and give an outlook on the field. We highlight the central role of humans in evaluating and enhancing AI robustness, considering the necessary knowledge they can provide, and discuss the need for better understanding practices and developing supportive tools in the future.</p>","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":null,"pages":null},"PeriodicalIF":16.6,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141251746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human Image Generation: A Comprehensive Survey 人类图像生成:全面调查
IF 16.6 1区 计算机科学 Q1 Mathematics Pub Date : 2024-05-22 DOI: 10.1145/3665869
Zhen Jia, Zhang Zhang, Liang Wang, Tieniu Tan

Image and video synthesis has become a blooming topic in computer vision and machine learning communities along with the developments of deep generative models, due to its great academic and application value. Many researchers have been devoted to synthesizing high-fidelity human images as one of the most commonly seen object categories in daily lives, where a large number of studies are performed based on various models, task settings and applications. Thus, it is necessary to give a comprehensive overview on these variant methods on human image generation. In this paper, we divide human image generation techniques into three paradigms, i.e., data-driven methods, knowledge-guided methods and hybrid methods. For each paradigm, the most representative models and the corresponding variants are presented, where the advantages and characteristics of different methods are summarized in terms of model architectures. Besides, the main public human image datasets and evaluation metrics in the literature are summarized. Furthermore, due to the wide application potentials, the typical downstream usages of synthesized human images are covered. Finally, the challenges and potential opportunities of human image generation are discussed to shed light on future research.

随着深度生成模型的发展,图像和视频合成因其巨大的学术和应用价值,已成为计算机视觉和机器学习领域一个蓬勃发展的课题。作为日常生活中最常见的对象类别之一,许多研究人员都致力于合成高保真人体图像,并基于各种模型、任务设置和应用进行了大量研究。因此,有必要对这些不同的人体图像生成方法进行全面概述。本文将人类图像生成技术分为三种范式,即数据驱动法、知识引导法和混合法。针对每种范式,我们都介绍了最具代表性的模型和相应的变体,并从模型架构的角度总结了不同方法的优势和特点。此外,还总结了文献中主要的公共人类图像数据集和评估指标。此外,由于合成人体图像具有广泛的应用潜力,还介绍了合成人体图像的典型下游用途。最后,讨论了人类图像生成所面临的挑战和潜在机遇,为未来研究提供启示。
{"title":"Human Image Generation: A Comprehensive Survey","authors":"Zhen Jia, Zhang Zhang, Liang Wang, Tieniu Tan","doi":"10.1145/3665869","DOIUrl":"https://doi.org/10.1145/3665869","url":null,"abstract":"<p>Image and video synthesis has become a blooming topic in computer vision and machine learning communities along with the developments of deep generative models, due to its great academic and application value. Many researchers have been devoted to synthesizing high-fidelity human images as one of the most commonly seen object categories in daily lives, where a large number of studies are performed based on various models, task settings and applications. Thus, it is necessary to give a comprehensive overview on these variant methods on human image generation. In this paper, we divide human image generation techniques into three paradigms, i.e., data-driven methods, knowledge-guided methods and hybrid methods. For each paradigm, the most representative models and the corresponding variants are presented, where the advantages and characteristics of different methods are summarized in terms of model architectures. Besides, the main public human image datasets and evaluation metrics in the literature are summarized. Furthermore, due to the wide application potentials, the typical downstream usages of synthesized human images are covered. Finally, the challenges and potential opportunities of human image generation are discussed to shed light on future research.</p>","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":null,"pages":null},"PeriodicalIF":16.6,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141251571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Survey on Malware Detection with Graph Representation Learning 利用图表示学习进行恶意软件检测的调查
IF 16.6 1区 计算机科学 Q1 Mathematics Pub Date : 2024-05-21 DOI: 10.1145/3664649
Tristan Bilot, Nour El Madhoun, Khaldoun Al Agha, Anis Zouaoui

Malware detection has become a major concern due to the increasing number and complexity of malware. Traditional detection methods based on signatures and heuristics are used for malware detection, but unfortunately, they suffer from poor generalization to unknown attacks and can be easily circumvented using obfuscation techniques. In recent years, Machine Learning (ML) and notably Deep Learning (DL) achieved impressive results in malware detection by learning useful representations from data and have become a solution preferred over traditional methods. Recently, the application of Graph Representation Learning (GRL) techniques on graph-structured data has demonstrated impressive capabilities in malware detection. This success benefits notably from the robust structure of graphs, which are challenging for attackers to alter, and their intrinsic explainability capabilities. In this survey, we provide an in-depth literature review to summarize and unify existing works under the common approaches and architectures. We notably demonstrate that Graph Neural Networks (GNNs) reach competitive results in learning robust embeddings from malware represented as expressive graph structures such as Function Call Graphs (FCGs) and Control Flow Graphs (CFGs). This study also discusses the robustness of GRL-based methods to adversarial attacks, contrasts their effectiveness with other ML/DL approaches, and outlines future research for practical deployment.

由于恶意软件的数量和复杂性不断增加,恶意软件检测已成为一个主要问题。传统的恶意软件检测方法基于签名和启发式方法,但遗憾的是,这些方法对未知攻击的泛化能力较差,而且很容易被混淆技术所规避。近年来,机器学习(ML),尤其是深度学习(DL)通过从数据中学习有用的表征,在恶意软件检测方面取得了令人瞩目的成果,并已成为一种优于传统方法的解决方案。最近,图表示学习(GRL)技术在图结构数据上的应用已在恶意软件检测中展现出令人印象深刻的能力。这种成功主要得益于图的强大结构(攻击者很难改变这种结构)及其内在的可解释性。在本调查报告中,我们对文献进行了深入回顾,总结并统一了通用方法和架构下的现有工作。值得注意的是,我们证明了图神经网络(GNN)在从以函数调用图(FCG)和控制流图(CFG)等表现性图结构表示的恶意软件中学习稳健嵌入方面取得了有竞争力的结果。本研究还讨论了基于 GRL 的方法对对抗性攻击的鲁棒性,对比了它们与其他 ML/DL 方法的有效性,并概述了未来的实际部署研究。
{"title":"A Survey on Malware Detection with Graph Representation Learning","authors":"Tristan Bilot, Nour El Madhoun, Khaldoun Al Agha, Anis Zouaoui","doi":"10.1145/3664649","DOIUrl":"https://doi.org/10.1145/3664649","url":null,"abstract":"<p>Malware detection has become a major concern due to the increasing number and complexity of malware. Traditional detection methods based on signatures and heuristics are used for malware detection, but unfortunately, they suffer from poor generalization to unknown attacks and can be easily circumvented using obfuscation techniques. In recent years, Machine Learning (ML) and notably Deep Learning (DL) achieved impressive results in malware detection by learning useful representations from data and have become a solution preferred over traditional methods. Recently, the application of Graph Representation Learning (GRL) techniques on graph-structured data has demonstrated impressive capabilities in malware detection. This success benefits notably from the robust structure of graphs, which are challenging for attackers to alter, and their intrinsic explainability capabilities. In this survey, we provide an in-depth literature review to summarize and unify existing works under the common approaches and architectures. We notably demonstrate that Graph Neural Networks (GNNs) reach competitive results in learning robust embeddings from malware represented as expressive graph structures such as Function Call Graphs (FCGs) and Control Flow Graphs (CFGs). This study also discusses the robustness of GRL-based methods to adversarial attacks, contrasts their effectiveness with other ML/DL approaches, and outlines future research for practical deployment.</p>","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":null,"pages":null},"PeriodicalIF":16.6,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141251731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causality for Trustworthy Artificial Intelligence: Status, Challenges and Perspectives 可信人工智能的因果关系:现状、挑战和前景
IF 16.6 1区 计算机科学 Q1 Mathematics Pub Date : 2024-05-20 DOI: 10.1145/3665494
A. Rawal, Adrienne Raglin, Danda B. Rawat, Brian M. Sadler, J. McCoy
Causal inference is the idea of cause-and-effect; this fundamental area of sciences can be applied to problem space associated with Newton’s laws or the devastating COVID-19 pandemic. The cause explains the “why” whereas the effect describes the “what”. The domain itself encompasses a plethora of disciplines from statistics and computer science to economics and philosophy. Recent advancements in machine learning (ML) and artificial intelligence (AI) systems, have nourished a renewed interest in identifying and estimating the cause-and-effect relationship from the substantial amount of available observational data. This has resulted in various new studies aimed at providing novel methods for identifying and estimating causal inference. We include a detailed taxonomy of causal inference frameworks, methods, and evaluation. An overview of causality for security is also provided. Open challenges are detailed, and approaches for evaluating the robustness of causal inference methods are described. This paper aims to provide a comprehensive survey on such studies of causality. We provide an in-depth review of causality frameworks, and describe the different methods.
因果推理是因果关系的概念;这一基本科学领域可应用于与牛顿定律或具有破坏性的 COVID-19 大流行病相关的问题空间。因解释了 "为什么",而果则描述了 "是什么"。这一领域本身涵盖了从统计学、计算机科学到经济学和哲学等众多学科。近来,机器学习(ML)和人工智能(AI)系统的进步再次激发了人们对从大量可用观测数据中识别和估算因果关系的兴趣。这导致了各种新的研究,旨在为因果推理的识别和估算提供新的方法。我们对因果推理框架、方法和评估进行了详细分类。我们还提供了安全因果关系概述。本文详细介绍了尚未解决的挑战,并描述了评估因果推理方法稳健性的方法。本文旨在对此类因果关系研究进行全面调查。我们深入回顾了因果关系框架,并介绍了不同的方法。
{"title":"Causality for Trustworthy Artificial Intelligence: Status, Challenges and Perspectives","authors":"A. Rawal, Adrienne Raglin, Danda B. Rawat, Brian M. Sadler, J. McCoy","doi":"10.1145/3665494","DOIUrl":"https://doi.org/10.1145/3665494","url":null,"abstract":"Causal inference is the idea of cause-and-effect; this fundamental area of sciences can be applied to problem space associated with Newton’s laws or the devastating COVID-19 pandemic. The cause explains the “why” whereas the effect describes the “what”. The domain itself encompasses a plethora of disciplines from statistics and computer science to economics and philosophy. Recent advancements in machine learning (ML) and artificial intelligence (AI) systems, have nourished a renewed interest in identifying and estimating the cause-and-effect relationship from the substantial amount of available observational data. This has resulted in various new studies aimed at providing novel methods for identifying and estimating causal inference. We include a detailed taxonomy of causal inference frameworks, methods, and evaluation. An overview of causality for security is also provided. Open challenges are detailed, and approaches for evaluating the robustness of causal inference methods are described. This paper aims to provide a comprehensive survey on such studies of causality. We provide an in-depth review of causality frameworks, and describe the different methods.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":null,"pages":null},"PeriodicalIF":16.6,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141119886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit 代码智能深度学习:调查、基准和工具包
IF 16.6 1区 计算机科学 Q1 Mathematics Pub Date : 2024-05-18 DOI: 10.1145/3664597
Yao Wan, Zhangqian Bi, Yang He, Jianguo Zhang, Hongyu Zhang, Yulei Sui, Guandong Xu, Hai Jin, Philip Yu

Code intelligence leverages machine learning techniques to extract knowledge from extensive code corpora, with the aim of developing intelligent tools to improve the quality and productivity of computer programming. Currently, there is already a thriving research community focusing on code intelligence, with efforts ranging from software engineering, machine learning, data mining, natural language processing, and programming languages. In this paper, we conduct a comprehensive literature review on deep learning for code intelligence, from the aspects of code representation learning, deep learning techniques, and application tasks. We also benchmark several state-of-the-art neural models for code intelligence, and provide an open-source toolkit tailored for the rapid prototyping of deep-learning-based code intelligence models. In particular, we inspect the existing code intelligence models under the basis of code representation learning, and provide a comprehensive overview to enhance comprehension of the present state of code intelligence. Furthermore, we publicly release the source code and data resources to provide the community with a ready-to-use benchmark, which can facilitate the evaluation and comparison of existing and future code intelligence models (https://xcodemind.github.io). At last, we also point out several challenging and promising directions for future research.

代码智能利用机器学习技术从大量代码库中提取知识,目的是开发智能工具,提高计算机编程的质量和生产率。目前,专注于代码智能的研究社区已经蓬勃发展,研究领域涉及软件工程、机器学习、数据挖掘、自然语言处理和编程语言。在本文中,我们从代码表示学习、深度学习技术和应用任务等方面,对用于代码智能的深度学习进行了全面的文献综述。我们还对几种最先进的代码智能神经模型进行了基准测试,并为基于深度学习的代码智能模型的快速原型开发提供了一个开源工具包。特别是,我们在代码表示学习的基础上考察了现有的代码智能模型,并提供了一个全面的概述,以加深对代码智能现状的理解。此外,我们还公开发布了源代码和数据资源,为社会各界提供了一个现成可用的基准,便于对现有和未来的代码智能模型(https://xcodemind.github.io)进行评估和比较。最后,我们还指出了几个具有挑战性和前景的未来研究方向。
{"title":"Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit","authors":"Yao Wan, Zhangqian Bi, Yang He, Jianguo Zhang, Hongyu Zhang, Yulei Sui, Guandong Xu, Hai Jin, Philip Yu","doi":"10.1145/3664597","DOIUrl":"https://doi.org/10.1145/3664597","url":null,"abstract":"<p>Code intelligence leverages machine learning techniques to extract knowledge from extensive code corpora, with the aim of developing intelligent tools to improve the quality and productivity of computer programming. Currently, there is already a thriving research community focusing on code intelligence, with efforts ranging from software engineering, machine learning, data mining, natural language processing, and programming languages. In this paper, we conduct a comprehensive literature review on deep learning for code intelligence, from the aspects of code representation learning, deep learning techniques, and application tasks. We also benchmark several state-of-the-art neural models for code intelligence, and provide an open-source toolkit tailored for the rapid prototyping of deep-learning-based code intelligence models. In particular, we inspect the existing code intelligence models under the basis of code representation learning, and provide a comprehensive overview to enhance comprehension of the present state of code intelligence. Furthermore, we publicly release the source code and data resources to provide the community with a ready-to-use benchmark, which can facilitate the evaluation and comparison of existing and future code intelligence models (https://xcodemind.github.io). At last, we also point out several challenging and promising directions for future research.</p>","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":null,"pages":null},"PeriodicalIF":16.6,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141251603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-tuning Database Systems: A Systematic Literature Review of Automatic Database Schema Design and Tuning 自调整数据库系统:自动数据库模式设计和调整的系统性文献综述
IF 16.6 1区 计算机科学 Q1 Mathematics Pub Date : 2024-05-17 DOI: 10.1145/3665323
M. Mozaffari, Anton Dignös, J. Gamper, U. Störl
Self-tuning is a feature of autonomic databases that includes the problem of automatic schema design. It aims at providing an optimized schema that increases the overall database performance. While in relational databases automatic schema design focuses on the automated design of the physical schema, in NoSQL databases all levels of representation are considered: conceptual, logical, and physical. This is mainly because the latter are mostly schema-less and lack a standard schema design procedure as is the case for SQL databases. In this work, we carry out a systematic literature survey on automatic schema design in both SQL and NoSQL databases. We identify the levels of representation and the methods that are used for the schema design problem, and we present a novel taxonomy to classify and compare different schema design solutions. Our comprehensive analysis demonstrates that, despite substantial progress that has been made, schema design is still a developing field and considerable challenges need to be addressed, notably for NoSQL databases. We highlight the most important findings from the results of our analysis and identify areas for future research work.
自调整是自主数据库的一项功能,包括自动模式设计问题。它旨在提供优化的模式,从而提高数据库的整体性能。在关系数据库中,自动模式设计侧重于物理模式的自动设计,而在 NoSQL 数据库中,则要考虑所有层次的表示:概念、逻辑和物理。这主要是因为后者大多没有模式,缺乏像 SQL 数据库那样的标准模式设计程序。在这项工作中,我们对 SQL 和 NoSQL 数据库中的自动模式设计进行了系统的文献调查。我们确定了模式设计问题所使用的表示层次和方法,并提出了一种新颖的分类法,用于对不同的模式设计解决方案进行分类和比较。我们的综合分析表明,尽管已经取得了实质性进展,但模式设计仍是一个发展中的领域,需要应对相当大的挑战,特别是对于 NoSQL 数据库。我们强调了分析结果中最重要的发现,并确定了未来研究工作的领域。
{"title":"Self-tuning Database Systems: A Systematic Literature Review of Automatic Database Schema Design and Tuning","authors":"M. Mozaffari, Anton Dignös, J. Gamper, U. Störl","doi":"10.1145/3665323","DOIUrl":"https://doi.org/10.1145/3665323","url":null,"abstract":"\u0000 Self-tuning is a feature of autonomic databases that includes the problem of automatic schema design. It aims at providing an optimized schema that increases the overall database performance. While in relational databases automatic schema design focuses on the automated design of the physical schema, in NoSQL databases all levels of representation are considered: conceptual, logical, and physical. This is mainly because the latter are mostly schema-less and lack a standard schema design procedure as is the case for SQL databases. In this work, we carry out a systematic literature survey on automatic schema design in both SQL\u0000 and\u0000 NoSQL databases. We identify the levels of representation and the methods that are used for the schema design problem, and we present a novel taxonomy to classify and compare different schema design solutions. Our comprehensive analysis demonstrates that, despite substantial progress that has been made, schema design is still a developing field and considerable challenges need to be addressed, notably for NoSQL databases. We highlight the most important findings from the results of our analysis and identify areas for future research work.\u0000","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":null,"pages":null},"PeriodicalIF":16.6,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140963290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Unified Review of Deep Learning for Automated Medical Coding 深度学习在医疗自动编码中的应用综述
IF 16.6 1区 计算机科学 Q1 Mathematics Pub Date : 2024-05-17 DOI: 10.1145/3664615
Shaoxiong Ji, Xiaobo Li, Wei Sun, Hang Dong, Ara Taalas, Yijia Zhang, Honghan Wu, Esa Pitkänen, Pekka Marttinen

Automated medical coding, an essential task for healthcare operation and delivery, makes unstructured data manageable by predicting medical codes from clinical documents. Recent advances in deep learning and natural language processing have been widely applied to this task. However, deep learning-based medical coding lacks a unified view of the design of neural network architectures. This review proposes a unified framework to provide a general understanding of the building blocks of medical coding models and summarizes recent advanced models under the proposed framework. Our unified framework decomposes medical coding into four main components, i.e., encoder modules for text feature extraction, mechanisms for building deep encoder architectures, decoder modules for transforming hidden representations into medical codes, and the usage of auxiliary information. Finally, we introduce the benchmarks and real-world usage and discuss key research challenges and future directions.

自动医疗编码是医疗运营和交付的一项重要任务,它通过预测临床文件中的医疗编码来管理非结构化数据。深度学习和自然语言处理领域的最新进展已被广泛应用于这项任务。然而,基于深度学习的医疗编码缺乏统一的神经网络架构设计观点。本综述提出了一个统一的框架,以提供对医疗编码模型构件的一般理解,并总结了拟议框架下的近期先进模型。我们的统一框架将医学编码分解为四个主要部分,即用于文本特征提取的编码器模块、构建深度编码器架构的机制、将隐藏表征转化为医学代码的解码器模块以及辅助信息的使用。最后,我们介绍了基准和实际使用情况,并讨论了主要研究挑战和未来方向。
{"title":"A Unified Review of Deep Learning for Automated Medical Coding","authors":"Shaoxiong Ji, Xiaobo Li, Wei Sun, Hang Dong, Ara Taalas, Yijia Zhang, Honghan Wu, Esa Pitkänen, Pekka Marttinen","doi":"10.1145/3664615","DOIUrl":"https://doi.org/10.1145/3664615","url":null,"abstract":"<p>Automated medical coding, an essential task for healthcare operation and delivery, makes unstructured data manageable by predicting medical codes from clinical documents. Recent advances in deep learning and natural language processing have been widely applied to this task. However, deep learning-based medical coding lacks a unified view of the design of neural network architectures. This review proposes a unified framework to provide a general understanding of the building blocks of medical coding models and summarizes recent advanced models under the proposed framework. Our unified framework decomposes medical coding into four main components, i.e., encoder modules for text feature extraction, mechanisms for building deep encoder architectures, decoder modules for transforming hidden representations into medical codes, and the usage of auxiliary information. Finally, we introduce the benchmarks and real-world usage and discuss key research challenges and future directions.</p>","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":null,"pages":null},"PeriodicalIF":16.6,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141251600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The First Principles: Setting the Context for a Safe and Secure Metaverse 首要原则:设定安全可靠的元宇宙环境
IF 16.6 1区 计算机科学 Q1 Mathematics Pub Date : 2024-05-17 DOI: 10.1145/3665495
Ankur Gupta, Sahil Sawhney, Kashyap Kompella
The metaverse delivered through converged and amalgamated technologies holds promise. No wonder technology heavyweights, large corporates, research organizations and businesses cutting across industry verticals are racing to put in place a metaverse-first strategy. The bets on consumers rapidly migrating from traditional social networks and collaborative applications to more immersive digital experiences have been placed. However, the transition is not expected to be seamless. Privacy, safety and security concerns abound in the early versions of the metaverse. Increased regulatory oversight and diverse national laws threaten to derail the hype around the metaverse. It is increasingly clear that the final iteration of the metaverse will need to assuage the concerns of individual users while addressing complex legal and regulatory requirements. Thus, a multi-perspective approach needs to be adopted to help set the agenda for the evolution of the metaverse. This research paper examines the different aspects and challenges which the future metaverse will need to address. A set of ”first principles” are formulated, which if implemented will lead to the development of an equitable, inclusive, safe and secure metaverse.
通过融合和合并技术实现的元网络大有可为。难怪技术重量级企业、大型公司、研究机构和跨行业垂直领域的企业都在竞相实施元宇宙优先战略。人们已经把赌注押在消费者从传统的社交网络和协作应用迅速迁移到更身临其境的数字体验上。然而,这一转变预计不会是无缝的。在元宇宙的早期版本中,隐私、安全和安保问题比比皆是。监管的加强和各国法律的多样化有可能会破坏围绕元宇宙的炒作。越来越清楚的是,元宇宙的最终迭代将需要消除个人用户的担忧,同时满足复杂的法律和监管要求。因此,需要采用多角度的方法来帮助制定元海外发展的议程。本研究论文探讨了未来的元网络需要应对的不同方面和挑战。本文提出了一系列 "首要原则",如果这些原则得到实施,将有助于建立一个公平、包容、安全和可靠的元宇宙。
{"title":"The First Principles: Setting the Context for a Safe and Secure Metaverse","authors":"Ankur Gupta, Sahil Sawhney, Kashyap Kompella","doi":"10.1145/3665495","DOIUrl":"https://doi.org/10.1145/3665495","url":null,"abstract":"The metaverse delivered through converged and amalgamated technologies holds promise. No wonder technology heavyweights, large corporates, research organizations and businesses cutting across industry verticals are racing to put in place a metaverse-first strategy. The bets on consumers rapidly migrating from traditional social networks and collaborative applications to more immersive digital experiences have been placed. However, the transition is not expected to be seamless. Privacy, safety and security concerns abound in the early versions of the metaverse. Increased regulatory oversight and diverse national laws threaten to derail the hype around the metaverse. It is increasingly clear that the final iteration of the metaverse will need to assuage the concerns of individual users while addressing complex legal and regulatory requirements. Thus, a multi-perspective approach needs to be adopted to help set the agenda for the evolution of the metaverse. This research paper examines the different aspects and challenges which the future metaverse will need to address. A set of ”first principles” are formulated, which if implemented will lead to the development of an equitable, inclusive, safe and secure metaverse.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":null,"pages":null},"PeriodicalIF":16.6,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140963893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM Computing Surveys
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1