With the fast development of large software projects, automatic code summarization techniques, which summarize the main functionalities of a piece of code using natural languages as comments, play essential roles in helping developers understand and maintain large software projects. Many research efforts have been devoted to building automatic code summarization approaches. Typical code summarization approaches are based on deep learning models. They transform the task into a sequence-to-sequence task, which inputs source code and outputs summarizations in natural languages. All code summarization models impose different input size limits, such as 50 to 10,000, for the input source code. However, how the input size limit affects the performance of code summarization models still remains under-explored. In this paper, we first conduct an empirical study to investigate the impacts of different input size limits on the quality of generated code comments. To our surprise, experiments on multiple models and datasets reveal that setting a low input size limit, such as 20, does not necessarily reduce the quality of generated comments.
Based on this finding, we further propose to use function signatures instead of full source code to summarize the main functionalities first and then input the function signatures into code summarization models. Experiments and statistical results show that inputs with signatures are, on average, more than 2 percentage points better than inputs without signatures and thus demonstrate the effectiveness of involving function signatures in code summarization. We also invite programmers to do a questionnaire to evaluate the quality of code summaries generated by two inputs with different truncation levels. The results show that function signatures generate, on average, 9.2% more high-quality comments than full code.
{"title":"Do Code Summarization Models Process Too Much Information? Function Signature May Be All What Is Needed","authors":"Xi Ding, Rui Peng, Xiangping Chen, Yuan Huang, Jing Bian, Zibin Zheng","doi":"10.1145/3652156","DOIUrl":"https://doi.org/10.1145/3652156","url":null,"abstract":"<p>With the fast development of large software projects, automatic code summarization techniques, which summarize the main functionalities of a piece of code using natural languages as comments, play essential roles in helping developers understand and maintain large software projects. Many research efforts have been devoted to building automatic code summarization approaches. Typical code summarization approaches are based on deep learning models. They transform the task into a sequence-to-sequence task, which inputs source code and outputs summarizations in natural languages. All code summarization models impose different input size limits, such as 50 to 10,000, for the input source code. However, how the input size limit affects the performance of code summarization models still remains under-explored. In this paper, we first conduct an empirical study to investigate the impacts of different input size limits on the quality of generated code comments. To our surprise, experiments on multiple models and datasets reveal that setting a low input size limit, such as 20, does not necessarily reduce the quality of generated comments. </p><p>Based on this finding, we further propose to use function signatures instead of full source code to summarize the main functionalities first and then input the function signatures into code summarization models. Experiments and statistical results show that inputs with signatures are, on average, more than 2 percentage points better than inputs without signatures and thus demonstrate the effectiveness of involving function signatures in code summarization. We also invite programmers to do a questionnaire to evaluate the quality of code summaries generated by two inputs with different truncation levels. The results show that function signatures generate, on average, 9.2% more high-quality comments than full code.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"2 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140127520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to its importance and widespread use in industry, automated testing of REST APIs has attracted major interest from the research community in the last few years. However, most of the work in the literature has been focused on black-box fuzzing. Although existing fuzzers have been used to automatically find many faults in existing APIs, there are still several open research challenges that hinder the achievement of better results (e.g., in terms of code coverage and fault finding). For example, under-specified schemas are a major issue for black-box fuzzers. Currently, EvoMaster is the only existing tool that supports white-box fuzzing of REST APIs. In this paper, we provide a series of novel white-box heuristics, including for example how to deal with under-specified constrains in API schemas, as well as under-specified schemas in SQL databases. Our novel techniques are implemented as an extension to our open-source, search-based fuzzer EvoMaster. An empirical study on 14 APIs from the EMB corpus, plus one industrial API, shows clear improvements of the results in some of these APIs.
由于 REST API 在行业中的重要性和广泛应用,其自动化测试在过去几年中引起了研究界的极大兴趣。然而,文献中的大部分工作都集中在黑盒模糊测试上。虽然现有的模糊器已被用于自动查找现有 API 中的许多故障,但仍有一些未解决的研究难题阻碍了取得更好的结果(例如,在代码覆盖率和故障查找方面)。例如,未指定的模式是黑盒模糊器的一个主要问题。目前,EvoMaster 是唯一支持 REST API 白盒模糊的现有工具。在本文中,我们提供了一系列新颖的白盒启发式方法,例如如何处理 API 模式中的欠规范约束,以及 SQL 数据库中的欠规范模式。我们的新技术是对开源、基于搜索的模糊器 EvoMaster 的扩展。对来自 EMB 语料库的 14 个应用程序接口以及一个工业应用程序接口进行的实证研究表明,其中一些应用程序接口的结果有了明显改善。
{"title":"Advanced White-Box Heuristics for Search-Based Fuzzing of REST APIs","authors":"Andrea Arcuri, Man Zhang, Juan Pablo Galeotti","doi":"10.1145/3652157","DOIUrl":"https://doi.org/10.1145/3652157","url":null,"abstract":"<p>Due to its importance and widespread use in industry, automated testing of REST APIs has attracted major interest from the research community in the last few years. However, most of the work in the literature has been focused on black-box fuzzing. Although existing fuzzers have been used to automatically find many faults in existing APIs, there are still several open research challenges that hinder the achievement of better results (e.g., in terms of code coverage and fault finding). For example, under-specified schemas are a major issue for black-box fuzzers. Currently, <span>EvoMaster</span> is the only existing tool that supports white-box fuzzing of REST APIs. In this paper, we provide a series of novel white-box heuristics, including for example how to deal with under-specified constrains in API schemas, as well as under-specified schemas in SQL databases. Our novel techniques are implemented as an extension to our open-source, search-based fuzzer <span>EvoMaster</span>. An empirical study on 14 APIs from the EMB corpus, plus one industrial API, shows clear improvements of the results in some of these APIs.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"89 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140098151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Source code authorship attribution is an important problem in practical applications such as plagiarism detection, software forensics, and copyright disputes. Recent studies show that existing methods for source code authorship attribution can be significantly affected by time evolution, leading to a decrease in attribution accuracy year by year. To alleviate the problem that Deep Learning (DL)-based source code authorship attribution degrading in accuracy due to time evolution, we propose a new framework called TimeDomain Adaptation (TimeDA) by adding new feature extractors to the original DL-based code attribution framework that enhances the learning ability of the original model on source domain features without requiring new or more source data. Moreover, we employ a centroid-based pseudo-labeling strategy using neighborhood clustering entropy for adaptive learning to improve the robustness of DL-based code authorship attribution. Experimental results show that TimeDA can significantly enhance the robustness of DL-based source code authorship attribution to time evolution, with an average improvement of 8.7% on the Java dataset and 5.2% on the C++ dataset. In addition, our TimeDA benefits from employing the centroid-based pseudo-labeling strategy, which significantly reduced the model training time by 87.3% compared to traditional unsupervised domain adaptive methods.
{"title":"Reducing the Impact of Time Evolution on Source Code Authorship Attribution via Domain Adaptation","authors":"Zhen Li, Shasha Zhao, Chen Chen, Qian Chen","doi":"10.1145/3652151","DOIUrl":"https://doi.org/10.1145/3652151","url":null,"abstract":"<p>Source code authorship attribution is an important problem in practical applications such as plagiarism detection, software forensics, and copyright disputes. Recent studies show that existing methods for source code authorship attribution can be significantly affected by time evolution, leading to a decrease in attribution accuracy year by year. To alleviate the problem that Deep Learning (DL)-based source code authorship attribution degrading in accuracy due to time evolution, we propose a new framework called <underline>Time</underline> <underline>D</underline>omain <underline>A</underline>daptation (TimeDA) by adding new feature extractors to the original DL-based code attribution framework that enhances the learning ability of the original model on source domain features without requiring new or more source data. Moreover, we employ a centroid-based pseudo-labeling strategy using neighborhood clustering entropy for adaptive learning to improve the robustness of DL-based code authorship attribution. Experimental results show that TimeDA can significantly enhance the robustness of DL-based source code authorship attribution to time evolution, with an average improvement of 8.7% on the Java dataset and 5.2% on the C++ dataset. In addition, our TimeDA benefits from employing the centroid-based pseudo-labeling strategy, which significantly reduced the model training time by 87.3% compared to traditional unsupervised domain adaptive methods.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"89 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140098252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenpeng Chen, Jie M. Zhang, Max Hort, Mark Harman, Federica Sarro
Unfair behaviors of Machine Learning (ML) software have garnered increasing attention and concern among software engineers. To tackle this issue, extensive research has been dedicated to conducting fairness testing of ML software, and this paper offers a comprehensive survey of existing studies in this field. We collect 100 papers and organize them based on the testing workflow (i.e., how to test) and testing components (i.e., what to test). Furthermore, we analyze the research focus, trends, and promising directions in the realm of fairness testing. We also identify widely-adopted datasets and open-source tools for fairness testing.
机器学习(ML)软件的不公平行为日益引起软件工程师的关注和担忧。为了解决这一问题,已有大量研究致力于对 ML 软件进行公平性测试,本文对该领域的现有研究进行了全面调查。我们收集了 100 篇论文,并根据测试工作流程(即如何测试)和测试组件(即测试什么)对其进行了整理。此外,我们还分析了公平性测试领域的研究重点、趋势和有前途的方向。我们还确定了广泛采用的公平性测试数据集和开源工具。
{"title":"Fairness Testing: A Comprehensive Survey and Analysis of Trends","authors":"Zhenpeng Chen, Jie M. Zhang, Max Hort, Mark Harman, Federica Sarro","doi":"10.1145/3652155","DOIUrl":"https://doi.org/10.1145/3652155","url":null,"abstract":"<p>Unfair behaviors of Machine Learning (ML) software have garnered increasing attention and concern among software engineers. To tackle this issue, extensive research has been dedicated to conducting fairness testing of ML software, and this paper offers a comprehensive survey of existing studies in this field. We collect 100 papers and organize them based on the testing workflow (i.e., how to test) and testing components (i.e., what to test). Furthermore, we analyze the research focus, trends, and promising directions in the realm of fairness testing. We also identify widely-adopted datasets and open-source tools for fairness testing.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"87 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140098256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, dynamic languages such as Python have become popular due to their flexibility and productivity. The lack of static typing makes programs face the challenges of fixing type errors, early bug detection, and code understanding. To alleviate these issues, PEP 484 introduced optional type annotations for Python in 2014, but unfortunately, a large number of programs are still not annotated by developers. Annotation generation tools can utilize type inference techniques. However, several important aspects of type annotation generation are overlooked by existing works, such as in-depth effectiveness analysis, potential improvement exploration, and practicality evaluation. And it is unclear how far we have been and how far we can go.
In this paper, we set out to comprehensively investigate the effectiveness of type inference tools for generating type annotations, applying three categories of state-of-the-art tools on a carefully-cleaned dataset. First, we use a comprehensive set of metrics and categories, finding that existing tools have different effectiveness and cannot achieve both high accuracy and high coverage. Then, we summarize six patterns to present the limitations in type annotation generation. Next, we implement a simple but effective tool to demonstrate that existing tools can be improved in practice. Finally, we conduct a controlled experiment showing that existing tools can reduce the time spent annotating types and determine more precise types, but cannot reduce subjective difficulty. Our findings point out the limitations and improvement directions in type annotation generation, which can inspire future work.
{"title":"Generating Python Type Annotations from Type Inference: How Far Are We?","authors":"Yimeng Guo, Zhifei Chen, Lin Chen, Wenjie Xu, Yanhui Li, Yuming Zhou, Baowen Xu","doi":"10.1145/3652153","DOIUrl":"https://doi.org/10.1145/3652153","url":null,"abstract":"<p>In recent years, dynamic languages such as Python have become popular due to their flexibility and productivity. The lack of static typing makes programs face the challenges of fixing type errors, early bug detection, and code understanding. To alleviate these issues, PEP 484 introduced optional type annotations for Python in 2014, but unfortunately, a large number of programs are still not annotated by developers. Annotation generation tools can utilize type inference techniques. However, several important aspects of type annotation generation are overlooked by existing works, such as in-depth effectiveness analysis, potential improvement exploration, and practicality evaluation. And it is unclear how far we have been and how far we can go. </p><p>In this paper, we set out to comprehensively investigate the effectiveness of type inference tools for generating type annotations, applying three categories of state-of-the-art tools on a carefully-cleaned dataset. First, we use a comprehensive set of metrics and categories, finding that existing tools have different effectiveness and cannot achieve both high accuracy and high coverage. Then, we summarize six patterns to present the limitations in type annotation generation. Next, we implement a simple but effective tool to demonstrate that existing tools can be improved in practice. Finally, we conduct a controlled experiment showing that existing tools can reduce the time spent annotating types and determine more precise types, but cannot reduce subjective difficulty. Our findings point out the limitations and improvement directions in type annotation generation, which can inspire future work.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"51 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140098355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stefanie Betz, Birgit Penzenstadler, Leticia Duboc, Ruzanna Chitchyan, Sedef Akinli Kocak, Ian Brooks, Shola Oyedeji, Jari Porras, Norbert Seyff, Colin C. Venters
[Context and Motivation] To foster a sustainable society within a sustainable environment, we must dramatically reshape our work and consumption activities, most of which are facilitated through software. Yet, most software engineers hardly consider the effects on the sustainability of the IT products and services they deliver. This issue is exacerbated by a lack of methods and tools for this purpose.
[Question/Problem] Despite the practical need for methods and tools that explicitly support consideration of the effects that IT products and services have on the sustainability of their intended environments, such methods and tools remain largely unavailable. Thus, urgent research is needed to understand how to design such tools for the IT community properly.
[Principal Ideas/Results] In this paper, we describe our experience using design science to create the Sustainability Awareness Framework (SusAF), which supports software engineers in anticipating and mitigating the potential sustainability effects during system development. More specifically, we identify and present the challenges faced during this process.
[Contribution] The challenges that we have faced and addressed in the development of the SusAF are likely to be relevant to others who aim to create methods and tools to integrate sustainability analysis into their IT Products and Service development. Thus, the lessons learned in SusAF development are shared for the benefit of researchers and other professionals who design tools for that end.
[背景与动机] 为了在可持续发展的环境中建设可持续发展的社会,我们必须大力重塑我们的工作和消费活动,而其中大部分活动都是通过软件来实现的。然而,大多数软件工程师几乎不考虑他们所提供的 IT 产品和服务对可持续发展的影响。由于缺乏相关的方法和工具,这一问题变得更加严重。[问题]尽管实际需要明确支持考虑 IT 产品和服务对其预期环境的可持续性影响的方法和工具,但这种方法和工具在很大程度上仍然不可用。因此,迫切需要开展研究,以了解如何为 IT 界适当设计此类工具。[主要观点/成果]在本文中,我们介绍了利用设计科学创建可持续发展意识框架(SusAF)的经验,该框架支持软件工程师在系统开发过程中预测和减轻潜在的可持续发展影响。更具体地说,我们确定并介绍了在这一过程中所面临的挑战。[贡献]我们在开发 SusAF 过程中面临和解决的挑战可能与其他旨在创建方法和工具以将可持续性分析纳入其 IT 产品和服务开发的人相关。因此,我们在此分享在开发 SusAF 过程中吸取的经验教训,以帮助研究人员和其他为此目的设计工具的专业人员。
{"title":"Lessons Learned from Developing a Sustainability Awareness Framework for Software Engineering using Design Science","authors":"Stefanie Betz, Birgit Penzenstadler, Leticia Duboc, Ruzanna Chitchyan, Sedef Akinli Kocak, Ian Brooks, Shola Oyedeji, Jari Porras, Norbert Seyff, Colin C. Venters","doi":"10.1145/3649597","DOIUrl":"https://doi.org/10.1145/3649597","url":null,"abstract":"<p><b>[Context and Motivation]</b> To foster a sustainable society within a sustainable environment, we must dramatically reshape our work and consumption activities, most of which are facilitated through software. Yet, most software engineers hardly consider the effects on the sustainability of the IT products and services they deliver. This issue is exacerbated by a lack of methods and tools for this purpose. </p><p><b>[Question/Problem]</b> Despite the practical need for methods and tools that explicitly support consideration of the effects that IT products and services have on the sustainability of their intended environments, such methods and tools remain largely unavailable. Thus, urgent research is needed to understand how to design such tools for the IT community properly. </p><p><b>[Principal Ideas/Results]</b> In this paper, we describe our experience using design science to create the Sustainability Awareness Framework (SusAF), which supports software engineers in anticipating and mitigating the potential sustainability effects during system development. More specifically, we identify and present the challenges faced during this process. </p><p><b>[Contribution]</b> The challenges that we have faced and addressed in the development of the SusAF are likely to be relevant to others who aim to create methods and tools to integrate sustainability analysis into their IT Products and Service development. Thus, the lessons learned in SusAF development are shared for the benefit of researchers and other professionals who design tools for that end.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"25 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140069865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daan Hommersom, Antonino Sabetta, Bonaventura Coppola, Dario Di Nucci, Damian A. Tamburri
The lack of comprehensive sources of accurate vulnerability data represents a critical obstacle to studying and understanding software vulnerabilities (and their corrections). In this paper, we present an approach that combines heuristics stemming from practical experience and machine-learning (ML)—specifically, natural language processing (NLP)—to address this problem. Our method consists of three phases. First, we construct an advisory record