ACM Transactions on Software Engineering and Methodology最新文献_第5页

Do Code Summarization Models Process Too Much Information? Function Signature May Be All What Is Needed 代码摘要模型是否会处理过多信息？函数签名可能就是全部所需

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-03-14 DOI: 10.1145/3652156

Xi Ding, Rui Peng, Xiangping Chen, Yuan Huang, Jing Bian, Zibin Zheng

With the fast development of large software projects, automatic code summarization techniques, which summarize the main functionalities of a piece of code using natural languages as comments, play essential roles in helping developers understand and maintain large software projects. Many research efforts have been devoted to building automatic code summarization approaches. Typical code summarization approaches are based on deep learning models. They transform the task into a sequence-to-sequence task, which inputs source code and outputs summarizations in natural languages. All code summarization models impose different input size limits, such as 50 to 10,000, for the input source code. However, how the input size limit affects the performance of code summarization models still remains under-explored. In this paper, we first conduct an empirical study to investigate the impacts of different input size limits on the quality of generated code comments. To our surprise, experiments on multiple models and datasets reveal that setting a low input size limit, such as 20, does not necessarily reduce the quality of generated comments.

Based on this finding, we further propose to use function signatures instead of full source code to summarize the main functionalities first and then input the function signatures into code summarization models. Experiments and statistical results show that inputs with signatures are, on average, more than 2 percentage points better than inputs without signatures and thus demonstrate the effectiveness of involving function signatures in code summarization. We also invite programmers to do a questionnaire to evaluate the quality of code summaries generated by two inputs with different truncation levels. The results show that function signatures generate, on average, 9.2% more high-quality comments than full code.

随着大型软件项目的快速发展，使用自然语言作为注释总结代码主要功能的自动代码总结技术在帮助开发人员理解和维护大型软件项目方面发挥着至关重要的作用。许多研究工作都致力于构建自动代码总结方法。典型的代码总结方法基于深度学习模型。它们将任务转化为序列到序列的任务，输入源代码并输出自然语言摘要。所有代码摘要模型都对输入源代码的大小设置了不同的限制，如 50 到 10,000 不等。然而，输入大小限制如何影响代码摘要模型的性能仍未得到充分探讨。在本文中，我们首先进行了一项实证研究，探讨不同输入大小限制对生成代码注释质量的影响。出乎我们意料的是，在多个模型和数据集上进行的实验表明，设置较低的输入大小限制（如 20）并不一定会降低生成注释的质量。基于这一发现，我们进一步建议使用函数签名而不是完整的源代码来总结主要功能，然后将函数签名输入代码总结模型。实验和统计结果表明，有签名的输入比没有签名的输入平均要好 2 个百分点以上，从而证明了在代码摘要中使用函数签名的有效性。我们还邀请程序员进行问卷调查，以评估由两种截断程度不同的输入所生成的代码摘要的质量。结果显示，函数签名生成的高质量注释平均比完整代码多出 9.2%。

{"title":"Do Code Summarization Models Process Too Much Information? Function Signature May Be All What Is Needed","authors":"Xi Ding, Rui Peng, Xiangping Chen, Yuan Huang, Jing Bian, Zibin Zheng","doi":"10.1145/3652156","DOIUrl":"https://doi.org/10.1145/3652156","url":null,"abstract":"With the fast development of large software projects, automatic code summarization techniques, which summarize the main functionalities of a piece of code using natural languages as comments, play essential roles in helping developers understand and maintain large software projects. Many research efforts have been devoted to building automatic code summarization approaches. Typical code summarization approaches are based on deep learning models. They transform the task into a sequence-to-sequence task, which inputs source code and outputs summarizations in natural languages. All code summarization models impose different input size limits, such as 50 to 10,000, for the input source code. However, how the input size limit affects the performance of code summarization models still remains under-explored. In this paper, we first conduct an empirical study to investigate the impacts of different input size limits on the quality of generated code comments. To our surprise, experiments on multiple models and datasets reveal that setting a low input size limit, such as 20, does not necessarily reduce the quality of generated comments. Based on this finding, we further propose to use function signatures instead of full source code to summarize the main functionalities first and then input the function signatures into code summarization models. Experiments and statistical results show that inputs with signatures are, on average, more than 2 percentage points better than inputs without signatures and thus demonstrate the effectiveness of involving function signatures in code summarization. We also invite programmers to do a questionnaire to evaluate the quality of code summaries generated by two inputs with different truncation levels. The results show that function signatures generate, on average, 9.2% more high-quality comments than full code.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"2 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140127520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advanced White-Box Heuristics for Search-Based Fuzzing of REST APIs 基于搜索的 REST 应用程序接口模糊处理高级白盒启发式算法

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-03-11 DOI: 10.1145/3652157

Andrea Arcuri, Man Zhang, Juan Pablo Galeotti

Due to its importance and widespread use in industry, automated testing of REST APIs has attracted major interest from the research community in the last few years. However, most of the work in the literature has been focused on black-box fuzzing. Although existing fuzzers have been used to automatically find many faults in existing APIs, there are still several open research challenges that hinder the achievement of better results (e.g., in terms of code coverage and fault finding). For example, under-specified schemas are a major issue for black-box fuzzers. Currently, EvoMaster is the only existing tool that supports white-box fuzzing of REST APIs. In this paper, we provide a series of novel white-box heuristics, including for example how to deal with under-specified constrains in API schemas, as well as under-specified schemas in SQL databases. Our novel techniques are implemented as an extension to our open-source, search-based fuzzer EvoMaster. An empirical study on 14 APIs from the EMB corpus, plus one industrial API, shows clear improvements of the results in some of these APIs.

由于 REST API 在行业中的重要性和广泛应用，其自动化测试在过去几年中引起了研究界的极大兴趣。然而，文献中的大部分工作都集中在黑盒模糊测试上。虽然现有的模糊器已被用于自动查找现有 API 中的许多故障，但仍有一些未解决的研究难题阻碍了取得更好的结果（例如，在代码覆盖率和故障查找方面）。例如，未指定的模式是黑盒模糊器的一个主要问题。目前，EvoMaster 是唯一支持 REST API 白盒模糊的现有工具。在本文中，我们提供了一系列新颖的白盒启发式方法，例如如何处理 API 模式中的欠规范约束，以及 SQL 数据库中的欠规范模式。我们的新技术是对开源、基于搜索的模糊器 EvoMaster 的扩展。对来自 EMB 语料库的 14 个应用程序接口以及一个工业应用程序接口进行的实证研究表明，其中一些应用程序接口的结果有了明显改善。

引用次数: 0

Reducing the Impact of Time Evolution on Source Code Authorship Attribution via Domain Adaptation 通过领域适应减少时间演变对源代码作者归属的影响

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-03-11 DOI: 10.1145/3652151

Zhen Li, Shasha Zhao, Chen Chen, Qian Chen

Source code authorship attribution is an important problem in practical applications such as plagiarism detection, software forensics, and copyright disputes. Recent studies show that existing methods for source code authorship attribution can be significantly affected by time evolution, leading to a decrease in attribution accuracy year by year. To alleviate the problem that Deep Learning (DL)-based source code authorship attribution degrading in accuracy due to time evolution, we propose a new framework called Time Domain Adaptation (TimeDA) by adding new feature extractors to the original DL-based code attribution framework that enhances the learning ability of the original model on source domain features without requiring new or more source data. Moreover, we employ a centroid-based pseudo-labeling strategy using neighborhood clustering entropy for adaptive learning to improve the robustness of DL-based code authorship attribution. Experimental results show that TimeDA can significantly enhance the robustness of DL-based source code authorship attribution to time evolution, with an average improvement of 8.7% on the Java dataset and 5.2% on the C++ dataset. In addition, our TimeDA benefits from employing the centroid-based pseudo-labeling strategy, which significantly reduced the model training time by 87.3% compared to traditional unsupervised domain adaptive methods.

源代码作者归属是剽窃检测、软件取证和版权纠纷等实际应用中的一个重要问题。最新研究表明，现有的源代码作者归属方法会受到时间演化的显著影响，导致归属准确性逐年下降。为了缓解基于深度学习（Deep Learning，DL）的源代码作者归属因时间演化而导致准确性下降的问题，我们提出了一种名为时域适应（Time Domain Adaptation，TimeDA）的新框架，通过在原有的基于深度学习的代码归属框架中添加新的特征提取器，增强了原有模型对源领域特征的学习能力，而无需新的或更多的源数据。此外，我们还采用了基于中心点的伪标签策略，利用邻域聚类熵进行自适应学习，以提高基于 DL 的代码作者归属的鲁棒性。实验结果表明，TimeDA 能显著提高基于 DL 的源代码作者归属对时间演化的鲁棒性，在 Java 数据集上平均提高了 8.7%，在 C++ 数据集上平均提高了 5.2%。此外，我们的 TimeDA 还得益于基于中心点的伪标签策略，与传统的无监督领域自适应方法相比，它大大减少了 87.3% 的模型训练时间。

{"title":"Reducing the Impact of Time Evolution on Source Code Authorship Attribution via Domain Adaptation","authors":"Zhen Li, Shasha Zhao, Chen Chen, Qian Chen","doi":"10.1145/3652151","DOIUrl":"https://doi.org/10.1145/3652151","url":null,"abstract":"Source code authorship attribution is an important problem in practical applications such as plagiarism detection, software forensics, and copyright disputes. Recent studies show that existing methods for source code authorship attribution can be significantly affected by time evolution, leading to a decrease in attribution accuracy year by year. To alleviate the problem that Deep Learning (DL)-based source code authorship attribution degrading in accuracy due to time evolution, we propose a new framework called <underline>Time</underline> <underline>D</underline>omain <underline>A</underline>daptation (TimeDA) by adding new feature extractors to the original DL-based code attribution framework that enhances the learning ability of the original model on source domain features without requiring new or more source data. Moreover, we employ a centroid-based pseudo-labeling strategy using neighborhood clustering entropy for adaptive learning to improve the robustness of DL-based code authorship attribution. Experimental results show that TimeDA can significantly enhance the robustness of DL-based source code authorship attribution to time evolution, with an average improvement of 8.7% on the Java dataset and 5.2% on the C++ dataset. In addition, our TimeDA benefits from employing the centroid-based pseudo-labeling strategy, which significantly reduced the model training time by 87.3% compared to traditional unsupervised domain adaptive methods.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"89 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140098252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fairness Testing: A Comprehensive Survey and Analysis of Trends 公平性测试：全面调查与趋势分析

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-03-11 DOI: 10.1145/3652155

Zhenpeng Chen, Jie M. Zhang, Max Hort, Mark Harman, Federica Sarro

Unfair behaviors of Machine Learning (ML) software have garnered increasing attention and concern among software engineers. To tackle this issue, extensive research has been dedicated to conducting fairness testing of ML software, and this paper offers a comprehensive survey of existing studies in this field. We collect 100 papers and organize them based on the testing workflow (i.e., how to test) and testing components (i.e., what to test). Furthermore, we analyze the research focus, trends, and promising directions in the realm of fairness testing. We also identify widely-adopted datasets and open-source tools for fairness testing.

机器学习（ML）软件的不公平行为日益引起软件工程师的关注和担忧。为了解决这一问题，已有大量研究致力于对 ML 软件进行公平性测试，本文对该领域的现有研究进行了全面调查。我们收集了 100 篇论文，并根据测试工作流程（即如何测试）和测试组件（即测试什么）对其进行了整理。此外，我们还分析了公平性测试领域的研究重点、趋势和有前途的方向。我们还确定了广泛采用的公平性测试数据集和开源工具。

引用次数: 0

Generating Python Type Annotations from Type Inference: How Far Are We? 从类型推断生成 Python 类型注解：我们还有多远？

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-03-11 DOI: 10.1145/3652153

Yimeng Guo, Zhifei Chen, Lin Chen, Wenjie Xu, Yanhui Li, Yuming Zhou, Baowen Xu

In recent years, dynamic languages such as Python have become popular due to their flexibility and productivity. The lack of static typing makes programs face the challenges of fixing type errors, early bug detection, and code understanding. To alleviate these issues, PEP 484 introduced optional type annotations for Python in 2014, but unfortunately, a large number of programs are still not annotated by developers. Annotation generation tools can utilize type inference techniques. However, several important aspects of type annotation generation are overlooked by existing works, such as in-depth effectiveness analysis, potential improvement exploration, and practicality evaluation. And it is unclear how far we have been and how far we can go.

In this paper, we set out to comprehensively investigate the effectiveness of type inference tools for generating type annotations, applying three categories of state-of-the-art tools on a carefully-cleaned dataset. First, we use a comprehensive set of metrics and categories, finding that existing tools have different effectiveness and cannot achieve both high accuracy and high coverage. Then, we summarize six patterns to present the limitations in type annotation generation. Next, we implement a simple but effective tool to demonstrate that existing tools can be improved in practice. Finally, we conduct a controlled experiment showing that existing tools can reduce the time spent annotating types and determine more precise types, but cannot reduce subjective difficulty. Our findings point out the limitations and improvement directions in type annotation generation, which can inspire future work.

近年来，以 Python 为代表的动态语言因其灵活性和生产力而大受欢迎。由于缺乏静态类型，程序在修复类型错误、早期错误检测和代码理解方面面临挑战。为了缓解这些问题，PEP 484 在 2014 年为 Python 引入了可选的类型注解，但遗憾的是，大量程序仍未被开发人员注解。注释生成工具可以利用类型推断技术。然而，现有工作忽略了类型注释生成的几个重要方面，如深入的有效性分析、潜在的改进探索和实用性评估。我们已经走了多远，还能走多远，目前还不清楚。在本文中，我们着手全面研究类型推理工具生成类型注释的有效性，在一个经过仔细清理的数据集上应用了三类最先进的工具。首先，我们使用了一套全面的衡量标准和类别，发现现有工具的有效性各不相同，无法同时实现高准确率和高覆盖率。然后，我们总结了六种模式，提出了类型注释生成的局限性。接下来，我们实现了一个简单但有效的工具，以证明现有工具可以在实践中得到改进。最后，我们进行了一项对照实验，表明现有工具可以减少注释类型所花费的时间，并确定更精确的类型，但无法降低主观难度。我们的研究结果指出了类型注释生成的局限性和改进方向，这将对未来的工作有所启发。

{"title":"Generating Python Type Annotations from Type Inference: How Far Are We?","authors":"Yimeng Guo, Zhifei Chen, Lin Chen, Wenjie Xu, Yanhui Li, Yuming Zhou, Baowen Xu","doi":"10.1145/3652153","DOIUrl":"https://doi.org/10.1145/3652153","url":null,"abstract":"In recent years, dynamic languages such as Python have become popular due to their flexibility and productivity. The lack of static typing makes programs face the challenges of fixing type errors, early bug detection, and code understanding. To alleviate these issues, PEP 484 introduced optional type annotations for Python in 2014, but unfortunately, a large number of programs are still not annotated by developers. Annotation generation tools can utilize type inference techniques. However, several important aspects of type annotation generation are overlooked by existing works, such as in-depth effectiveness analysis, potential improvement exploration, and practicality evaluation. And it is unclear how far we have been and how far we can go. In this paper, we set out to comprehensively investigate the effectiveness of type inference tools for generating type annotations, applying three categories of state-of-the-art tools on a carefully-cleaned dataset. First, we use a comprehensive set of metrics and categories, finding that existing tools have different effectiveness and cannot achieve both high accuracy and high coverage. Then, we summarize six patterns to present the limitations in type annotation generation. Next, we implement a simple but effective tool to demonstrate that existing tools can be improved in practice. Finally, we conduct a controlled experiment showing that existing tools can reduce the time spent annotating types and determine more precise types, but cannot reduce subjective difficulty. Our findings point out the limitations and improvement directions in type annotation generation, which can inspire future work.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"51 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140098355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lessons Learned from Developing a Sustainability Awareness Framework for Software Engineering using Design Science 利用设计科学开发软件工程可持续性意识框架的经验教训

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-03-08 DOI: 10.1145/3649597

Stefanie Betz, Birgit Penzenstadler, Leticia Duboc, Ruzanna Chitchyan, Sedef Akinli Kocak, Ian Brooks, Shola Oyedeji, Jari Porras, Norbert Seyff, Colin C. Venters

[Context and Motivation] To foster a sustainable society within a sustainable environment, we must dramatically reshape our work and consumption activities, most of which are facilitated through software. Yet, most software engineers hardly consider the effects on the sustainability of the IT products and services they deliver. This issue is exacerbated by a lack of methods and tools for this purpose.

[Question/Problem] Despite the practical need for methods and tools that explicitly support consideration of the effects that IT products and services have on the sustainability of their intended environments, such methods and tools remain largely unavailable. Thus, urgent research is needed to understand how to design such tools for the IT community properly.

[Principal Ideas/Results] In this paper, we describe our experience using design science to create the Sustainability Awareness Framework (SusAF), which supports software engineers in anticipating and mitigating the potential sustainability effects during system development. More specifically, we identify and present the challenges faced during this process.

[Contribution] The challenges that we have faced and addressed in the development of the SusAF are likely to be relevant to others who aim to create methods and tools to integrate sustainability analysis into their IT Products and Service development. Thus, the lessons learned in SusAF development are shared for the benefit of researchers and other professionals who design tools for that end.

[背景与动机] 为了在可持续发展的环境中建设可持续发展的社会，我们必须大力重塑我们的工作和消费活动，而其中大部分活动都是通过软件来实现的。然而，大多数软件工程师几乎不考虑他们所提供的 IT 产品和服务对可持续发展的影响。由于缺乏相关的方法和工具，这一问题变得更加严重。[问题]尽管实际需要明确支持考虑 IT 产品和服务对其预期环境的可持续性影响的方法和工具，但这种方法和工具在很大程度上仍然不可用。因此，迫切需要开展研究，以了解如何为 IT 界适当设计此类工具。[主要观点/成果]在本文中，我们介绍了利用设计科学创建可持续发展意识框架（SusAF）的经验，该框架支持软件工程师在系统开发过程中预测和减轻潜在的可持续发展影响。更具体地说，我们确定并介绍了在这一过程中所面临的挑战。[贡献]我们在开发 SusAF 过程中面临和解决的挑战可能与其他旨在创建方法和工具以将可持续性分析纳入其 IT 产品和服务开发的人相关。因此，我们在此分享在开发 SusAF 过程中吸取的经验教训，以帮助研究人员和其他为此目的设计工具的专业人员。

{"title":"Lessons Learned from Developing a Sustainability Awareness Framework for Software Engineering using Design Science","authors":"Stefanie Betz, Birgit Penzenstadler, Leticia Duboc, Ruzanna Chitchyan, Sedef Akinli Kocak, Ian Brooks, Shola Oyedeji, Jari Porras, Norbert Seyff, Colin C. Venters","doi":"10.1145/3649597","DOIUrl":"https://doi.org/10.1145/3649597","url":null,"abstract":"[Context and Motivation] To foster a sustainable society within a sustainable environment, we must dramatically reshape our work and consumption activities, most of which are facilitated through software. Yet, most software engineers hardly consider the effects on the sustainability of the IT products and services they deliver. This issue is exacerbated by a lack of methods and tools for this purpose. [Question/Problem] Despite the practical need for methods and tools that explicitly support consideration of the effects that IT products and services have on the sustainability of their intended environments, such methods and tools remain largely unavailable. Thus, urgent research is needed to understand how to design such tools for the IT community properly. [Principal Ideas/Results] In this paper, we describe our experience using design science to create the Sustainability Awareness Framework (SusAF), which supports software engineers in anticipating and mitigating the potential sustainability effects during system development. More specifically, we identify and present the challenges faced during this process. [Contribution] The challenges that we have faced and addressed in the development of the SusAF are likely to be relevant to others who aim to create methods and tools to integrate sustainability analysis into their IT Products and Service development. Thus, the lessons learned in SusAF development are shared for the benefit of researchers and other professionals who design tools for that end.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"25 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140069865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automated Mapping of Vulnerability Advisories onto their Fix Commits in Open Source Repositories 将漏洞公告自动映射到开源软件库中的修复提交中

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-03-04 DOI: 10.1145/3649590

Daan Hommersom, Antonino Sabetta, Bonaventura Coppola, Dario Di Nucci, Damian A. Tamburri

The lack of comprehensive sources of accurate vulnerability data represents a critical obstacle to studying and understanding software vulnerabilities (and their corrections). In this paper, we present an approach that combines heuristics stemming from practical experience and machine-learning (ML)—specifically, natural language processing (NLP)—to address this problem. Our method consists of three phases. First, we construct an advisory recordobject containing key information about a vulnerability that is extracted from an advisory, such those found in the National Vulnerability Database (NVD). These advisories are expressed in natural language. Second, using heuristics, a subset of candidate fix commits is obtained from the source code repository of the affected project, by filtering out commits that can be identified as unrelated to the vulnerability at hand. Finally, for each of the remaining candidate commits, our method builds a numerical feature vector reflecting the characteristics of the commit that are relevant to predicting its match with the advisory at hand. Based on the values of these feature vectors, our method produces a ranked list of candidate fixing commits. The score attributed by the ML model to each feature is kept visible to the users, allowing them to easily interpret the predictions.

We implemented our approach and we evaluated it on an open data set, built by manual curation, that comprises 2,391 known fix commits corresponding to 1,248 public vulnerability advisories. When considering the top-10 commits in the ranked results, our implementation could successfully identify at least one fix commit for up to 84.03% of the vulnerabilities (with a fix commit on the first position for 65.06% of the vulnerabilities). Our evaluation shows that our method can reduce considerably the manual effort needed to search OSS repositories for the commits that fix known vulnerabilities.

缺乏全面准确的漏洞数据来源是研究和理解软件漏洞（及其修正）的一个关键障碍。在本文中，我们介绍了一种方法，它结合了源自实践经验的启发式方法和机器学习（ML）--特别是自然语言处理（NLP）--来解决这一问题。我们的方法包括三个阶段。首先，我们构建一个咨询记录对象，其中包含从咨询（如国家漏洞数据库（NVD）中找到的咨询）中提取的有关漏洞的关键信息。这些咨询用自然语言表达。其次，使用启发式方法，从受影响项目的源代码库中获取候选修复提交的子集，过滤掉与当前漏洞无关的提交。最后，对于剩余的每个候选提交，我们的方法都会建立一个数字特征向量，反映与预测其与当前咨询匹配相关的提交特征。根据这些特征向量的值，我们的方法会生成一份候选修复提交的排序列表。用户可以看到 ML 模型对每个特征的评分，从而轻松解读预测结果。我们实施了我们的方法，并在一个开放数据集上对其进行了评估，该数据集由人工整理建立，包含 2,391 个已知修复提交，与 1,248 个公开漏洞公告相对应。考虑到排序结果中的前 10 次提交，我们的方法可以成功识别出 84.03% 的漏洞的至少一次修复提交（其中 65.06% 的漏洞的修复提交排在第一位）。我们的评估结果表明，我们的方法可以大大减少人工搜索开放源码软件库中修复已知漏洞的提交所需的工作量。

{"title":"Automated Mapping of Vulnerability Advisories onto their Fix Commits in Open Source Repositories","authors":"Daan Hommersom, Antonino Sabetta, Bonaventura Coppola, Dario Di Nucci, Damian A. Tamburri","doi":"10.1145/3649590","DOIUrl":"https://doi.org/10.1145/3649590","url":null,"abstract":"The lack of comprehensive sources of accurate vulnerability data represents a critical obstacle to studying and understanding software vulnerabilities (and their corrections). In this paper, we present an approach that combines heuristics stemming from practical experience and machine-learning (ML)—specifically, natural language processing (NLP)—to address this problem. Our method consists of three phases. First, we construct an advisory record\u0000object containing key information about a vulnerability that is extracted from an advisory, such those found in the National Vulnerability Database (NVD). These advisories are expressed in natural language. Second, using heuristics, a subset of candidate fix commits is obtained from the source code repository of the affected project, by filtering out commits that can be identified as unrelated to the vulnerability at hand. Finally, for each of the remaining candidate commits, our method builds a numerical feature vector reflecting the characteristics of the commit that are relevant to predicting its match with the advisory at hand. Based on the values of these feature vectors, our method produces a ranked list of candidate fixing commits. The score attributed by the ML model to each feature is kept visible to the users, allowing them to easily interpret the predictions. We implemented our approach and we evaluated it on an open data set, built by manual curation, that comprises 2,391 known fix commits corresponding to 1,248 public vulnerability advisories. When considering the top-10 commits in the ranked results, our implementation could successfully identify at least one fix commit for up to 84.03% of the vulnerabilities (with a fix commit on the first position for 65.06% of the vulnerabilities). Our evaluation shows that our method can reduce considerably the manual effort needed to search OSS repositories for the commits that fix known vulnerabilities.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"69 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140033768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Communicating Study Design Trade-offs in Software Engineering 沟通软件工程中研究设计的权衡取舍

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-03-02 DOI: 10.1145/3649598

Martin P. Robillard, Deeksha M. Arya, Neil A. Ernst, Jin L.C. Guo, Maxime Lamothe, Mathieu Nassif, Nicole Novielli, Alexander Serebrenik, Igor Steinmacher, Klaas-Jan Stol

Reflecting on the limitations of a study is a crucial part of the research process. In software engineering studies, this reflection is typically conveyed through discussions of study limitations or threats to validity. In current practice, such discussions seldom provide sufficient insight to understand the rationale for decisions taken before and during the study, and their implications. We revisit the practice of discussing study limitations and threats to validity and identify its weaknesses. We propose to refocus this practice of self-reflection to a discussion centered on the notion of trade-offs. We argue that documenting trade-offs allows researchers to clarify how the benefits of their study design decisions outweigh the costs of possible alternatives. We present guidelines for reporting trade-offs in a way that promotes a fair and dispassionate assessment of researchers’ work.

对研究局限性的反思是研究过程的重要组成部分。在软件工程研究中，这种反思通常是通过对研究局限性或有效性威胁的讨论来传达的。在当前的实践中，这种讨论很少能提供足够的见解，让人理解在研究之前和研究过程中所做决定的理由及其影响。我们重新审视了讨论研究局限性和有效性威胁的做法，并找出了其不足之处。我们建议将这种自我反思的做法重新聚焦到以权衡概念为中心的讨论上。我们认为，记录权衡可以让研究人员澄清其研究设计决策的收益如何大于可能的替代方案的成本。我们提出了报告权衡的指导原则，以促进对研究人员的工作进行公平、冷静的评估。

引用次数: 0

Fine-Grained Coverage-Based Fuzzing - RCR Report 基于覆盖范围的细粒度模糊测试 - RCR 报告

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-02-27 DOI: 10.1145/3649592

Wei-Cheng Wu, Bernard Nongpoh, Marwan Nour, Michaël Marcozzi, Sébastien Bardin, Christophe Hauser

This is the RCR report of the artifact for the paper ”Fine-Grained Coverage-Based Fuzzing”. The attached zip file contains scripts and pre-build binary programs to reproduce the results presented in the main paper. The artifact is released on Zenodo with DOI: 10.5281/zenodo.7275184. We claim the artifact to be available, functional and reusable. The technology skills needed to review the artifact is know to use Linux/Unix terminal and basic understanding of Docker.

这是论文 "基于细粒度覆盖范围的模糊测试 "的工件 RCR 报告。所附 zip 文件包含脚本和预编译二进制程序，用于重现主要论文中的结果。该工具发布在 Zenodo 上，DOI：10.5281/zenodo.7275184。我们声称该工具是可用的、功能性的和可重复使用的。审查该工件所需的技术技能是知道如何使用 Linux/Unix 终端以及对 Docker 的基本了解。

引用次数: 0

Precisely Extracting Complex Variable Values from Android Apps 从 Android 应用程序中精确提取复杂变量值

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-02-27 DOI: 10.1145/3649591

Marc Miltenberger, Steven Arzt

Millions of users nowadays rely on their smartphones to process sensitive data through apps from various vendors and sources. Therefore, it is vital to assess these apps for security vulnerabilities and privacy violations. Information such as to which server an app connects through which protocol, and which algorithm it applies for encryption are usually encoded as variable values and arguments of API calls. However, extracting these values from an app is not trivial. The source code of an app is usually not available, and manual reverse engineering is cumbersome with binary sizes in the tens of megabytes. Current automated tools, on the other hand, cannot retrieve values that are computed at runtime through complex transformations.

In this paper, we present ValDroid, a novel static analysis tool for automatically extracting the set of possible values for a given variable at a given statement in the Dalvik byte code of an Android app. We evaluate ValDroidagainst existing approaches (JSA, Violist, DroidRA, Harvester, BlueSeal, StringHound, IC3, COAL) on benchmarks and 794 real-world apps. ValDroid greatly outperforms existing tools. It provides an average F₁ score of more than 90%, while only requiring 0.1 seconds per value on average. For many data types including Network Connections and Dynamic Code Loading, its recall is more than twice the recall of the best existing approaches.

如今，数以百万计的用户依靠智能手机通过来自不同供应商和来源的应用程序处理敏感数据。因此，评估这些应用程序是否存在安全漏洞和侵犯隐私至关重要。应用程序通过哪种协议连接到哪台服务器、采用哪种算法进行加密等信息，通常以变量值和 API 调用参数的形式编码。然而，从应用程序中提取这些值并非易事。应用程序的源代码通常不可用，手动逆向工程非常麻烦，二进制文件大小可达几十兆字节。另一方面，当前的自动工具无法检索运行时通过复杂转换计算的值。在本文中，我们介绍了 ValDroid，这是一种新型静态分析工具，用于自动提取安卓应用程序 Dalvik 字节代码中给定语句下给定变量的可能值集。我们在基准测试和 794 个真实应用程序上对 ValDroid 与现有方法（JSA、Violist、DroidRA、Harvester、BlueSeal、StringHound、IC3、COAL）进行了评估。ValDroid 的性能大大优于现有工具。它的平均 F1 得分超过 90%，而平均每个值只需要 0.1 秒。对于包括网络连接和动态代码加载在内的许多数据类型，它的召回率是现有最佳方法召回率的两倍多。

{"title":"Precisely Extracting Complex Variable Values from Android Apps","authors":"Marc Miltenberger, Steven Arzt","doi":"10.1145/3649591","DOIUrl":"https://doi.org/10.1145/3649591","url":null,"abstract":"Millions of users nowadays rely on their smartphones to process sensitive data through apps from various vendors and sources. Therefore, it is vital to assess these apps for security vulnerabilities and privacy violations. Information such as to which server an app connects through which protocol, and which algorithm it applies for encryption are usually encoded as variable values and arguments of API calls. However, extracting these values from an app is not trivial. The source code of an app is usually not available, and manual reverse engineering is cumbersome with binary sizes in the tens of megabytes. Current automated tools, on the other hand, cannot retrieve values that are computed at runtime through complex transformations. In this paper, we present ValDroid, a novel static analysis tool for automatically extracting the set of possible values for a given variable at a given statement in the Dalvik byte code of an Android app. We evaluate ValDroid\u0000against existing approaches (JSA, Violist, DroidRA, Harvester, BlueSeal, StringHound, IC3, COAL) on benchmarks and 794 real-world apps. ValDroid greatly outperforms existing tools. It provides an average F1 score of more than 90%, while only requiring 0.1 seconds per value on average. For many data types including Network Connections and Dynamic Code Loading, its recall is more than twice the recall of the best existing approaches.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"5 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139978081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0