2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)最新文献_第8页

An Empirical Study of Challenges in Converting Deep Learning Models 深度学习模型转换挑战的实证研究

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Pub Date : 2022-06-28 DOI: 10.1109/ICSME55016.2022.00010

Moses Openja, Amin Nikanjam, Ahmed Haj Yahmed, Foutse Khomh, Zhengyong Jiang

There is an increase in deploying Deep Learning (DL)-based software systems in real-world applications. Usually, DL models are developed and trained using DL frameworks like TensorFlow and PyTorch. Each framework has its own internal mechanisms/formats to represent and train DL models (deep neural networks), and usually those formats cannot be recognized by other frameworks. Moreover, trained models are usually deployed in environments different from where they were developed. To solve the interoperability issue and make DL models compatible with different frameworks/environments, some exchange formats are introduced for DL models, like ONNX and CoreML. However, ONNX and CoreML were never empirically evaluated by the community to reveal their prediction accuracy, performance, and robustness after conversion. Poor accuracy or non-robust behavior of converted models may lead to poor quality of deployed DL-based software systems. We conduct, in this paper, the first empirical study to assess ONNX and CoreML for converting trained DL models. In our systematic approach, two popular DL frameworks, Keras and PyTorch, are used to train five widely used DL models on three popular datasets. The trained models are then converted to ONNX and CoreML and transferred to two runtime environments designated for such formats, to be evaluated. We investigate the prediction accuracy before and after conversion. Our results unveil that the prediction accuracy of converted models are at the same level of originals. The performance (time cost and memory consumption) of converted models are studied as well. The size of models are reduced after conversion, which can result in optimized DL-based software deployment. We also study the adversarial robustness of converted models to make sure about the robustness of deployed DL-based software. Leveraging the state-of-the-art adversarial attack approaches, converted models are generally assessed robust at the same level of originals. However, obtained results show that CoreML models are more vulnerable to adversarial attacks compared to ONNX. The general message of our findings is that DL developers should be cautious on the deployment of converted models that may 1) perform poorly while switching from one framework to another, 2) have challenges in robust deployment, or 3) run slowly, leading to poor quality of deployed DL-based software, including DL-based software maintenance tasks, like bug prediction.

在实际应用中部署基于深度学习(DL)的软件系统的情况有所增加。通常，深度学习模型是使用像TensorFlow和PyTorch这样的深度学习框架开发和训练的。每个框架都有自己的内部机制/格式来表示和训练DL模型(深度神经网络)，通常这些格式不能被其他框架识别。此外，训练过的模型通常部署在不同于开发它们的环境中。为了解决互操作性问题并使深度学习模型与不同的框架/环境兼容，为深度学习模型引入了一些交换格式，如ONNX和CoreML。然而，ONNX和CoreML从未经过社区的经验评估，以揭示其转换后的预测准确性，性能和鲁棒性。转换模型的低准确性或非鲁棒性行为可能导致部署的基于dl的软件系统质量差。在本文中，我们进行了第一次实证研究，以评估ONNX和CoreML用于转换训练好的深度学习模型。在我们的系统方法中，使用两个流行的深度学习框架Keras和PyTorch在三个流行的数据集上训练五个广泛使用的深度学习模型。然后将训练好的模型转换为ONNX和CoreML，并转移到为这些格式指定的两个运行时环境中进行评估。研究了转换前后的预测精度。结果表明，转换后的模型预测精度与原始模型相当。对转换模型的性能(时间成本和内存消耗)进行了研究。转换后减小了模型的大小，从而优化了基于dl的软件部署。我们还研究了转换模型的对抗鲁棒性，以确保部署的基于dl的软件的鲁棒性。利用最先进的对抗性攻击方法，转换后的模型通常在与原始模型相同的水平上进行健壮性评估。然而，获得的结果表明，与ONNX相比，CoreML模型更容易受到对抗性攻击。我们发现的一般信息是，深度学习开发人员应该对转换模型的部署保持谨慎，这些模型可能1)在从一个框架切换到另一个框架时表现不佳，2)在健壮的部署中面临挑战，或者3)运行缓慢，导致部署的基于DL的软件质量差，包括基于DL的软件维护任务，如bug预测。

{"title":"An Empirical Study of Challenges in Converting Deep Learning Models","authors":"Moses Openja, Amin Nikanjam, Ahmed Haj Yahmed, Foutse Khomh, Zhengyong Jiang","doi":"10.1109/ICSME55016.2022.00010","DOIUrl":"https://doi.org/10.1109/ICSME55016.2022.00010","url":null,"abstract":"There is an increase in deploying Deep Learning (DL)-based software systems in real-world applications. Usually, DL models are developed and trained using DL frameworks like TensorFlow and PyTorch. Each framework has its own internal mechanisms/formats to represent and train DL models (deep neural networks), and usually those formats cannot be recognized by other frameworks. Moreover, trained models are usually deployed in environments different from where they were developed. To solve the interoperability issue and make DL models compatible with different frameworks/environments, some exchange formats are introduced for DL models, like ONNX and CoreML. However, ONNX and CoreML were never empirically evaluated by the community to reveal their prediction accuracy, performance, and robustness after conversion. Poor accuracy or non-robust behavior of converted models may lead to poor quality of deployed DL-based software systems. We conduct, in this paper, the first empirical study to assess ONNX and CoreML for converting trained DL models. In our systematic approach, two popular DL frameworks, Keras and PyTorch, are used to train five widely used DL models on three popular datasets. The trained models are then converted to ONNX and CoreML and transferred to two runtime environments designated for such formats, to be evaluated. We investigate the prediction accuracy before and after conversion. Our results unveil that the prediction accuracy of converted models are at the same level of originals. The performance (time cost and memory consumption) of converted models are studied as well. The size of models are reduced after conversion, which can result in optimized DL-based software deployment. We also study the adversarial robustness of converted models to make sure about the robustness of deployed DL-based software. Leveraging the state-of-the-art adversarial attack approaches, converted models are generally assessed robust at the same level of originals. However, obtained results show that CoreML models are more vulnerable to adversarial attacks compared to ONNX. The general message of our findings is that DL developers should be cautious on the deployment of converted models that may 1) perform poorly while switching from one framework to another, 2) have challenges in robust deployment, or 3) run slowly, leading to poor quality of deployed DL-based software, including DL-based software maintenance tasks, like bug prediction.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"239 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134415573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

BashExplainer: Retrieval-Augmented Bash Code Comment Generation based on Fine-tuned CodeBERT BashExplainer:基于微调CodeBERT的检索增强Bash代码注释生成

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Pub Date : 2022-06-27 DOI: 10.1109/ICSME55016.2022.00016

Chi Yu, Guang Yang, Xiang Chen, Ke Liu, Yanlin Zhou

Developers use shell commands for many tasks, such as file system management, network control, and process management. Bash is one of the most commonly used shells and plays an important role in Linux system development and maintenance. Due to the language flexibility of Bash code, developers who are not familiar with Bash often have difficulty understanding the purpose and functionality of Bash code. In this study, we study Bash code comment generation problem and proposed an automatic method BASHEXPLAINER based on two-stage training strategy. In the first stage, we train a Bash encoder by fine-tuning CodeBERT on our constructed Bash code corpus. In the second stage, we first retrieve the most similar code from the code repository for the target code based on semantic and lexical similarity. Then we use the trained Bash encoder to generate two vector representations. Finally, we fuse these two vector representations via the fusion layer and generate the code comment through the decoder. To show the competitiveness of our proposed method, we construct a high-quality corpus by combining the corpus shared in the previous NL2Bash study and the corpus shared in the NLC2CMD competition. This corpus contains 10,592 Bash codes and corresponding comments. Then we selected ten baselines from previous studies on automatic code comment generation, which cover information retrieval methods, deep learning methods, and hybrid methods. The experimental results show that in terms of the performance measures BLEU-3/4, METEOR, and ROUGR-L, BASHEXPLAINER can outperform all baselines by at least 8.75%, 9.29%, 4.77% and 3.86%. Then we design ablation experiments to show the component setting rationality of BASHEXPLAINER. Later, we conduct a human study to further show the competitiveness of BASHEXPLAINER. Finally, we develop a browser plug-in based on BASHEXPLAINER to facilitate the understanding of the Bash code for developers.

开发人员使用shell命令完成许多任务，例如文件系统管理、网络控制和进程管理。Bash是最常用的shell之一，在Linux系统的开发和维护中起着重要的作用。由于Bash代码的语言灵活性，不熟悉Bash的开发人员通常难以理解Bash代码的目的和功能。本文研究了Bash代码注释生成问题，提出了一种基于两阶段训练策略的BASHEXPLAINER自动生成方法。在第一阶段，我们通过在构建的Bash代码语料库上微调CodeBERT来训练Bash编码器。在第二阶段，我们首先根据语义和词法相似性从代码存储库中为目标代码检索最相似的代码。然后我们使用经过训练的Bash编码器生成两个向量表示。最后，我们通过融合层融合这两个向量表示，并通过解码器生成代码注释。为了展示我们提出的方法的竞争力，我们将之前NL2Bash研究中共享的语料库与NLC2CMD竞赛中共享的语料库相结合，构建了一个高质量的语料库。这个语料库包含10,592个Bash代码和相应的注释。在此基础上，选取了10条基于信息检索方法、深度学习方法和混合方法的代码注释自动生成基线。实验结果表明，在BLEU-3/4、METEOR和rour - l的性能度量方面，BASHEXPLAINER至少比所有基线高出8.75%、9.29%、4.77%和3.86%。然后设计烧蚀实验，验证BASHEXPLAINER组件设置的合理性。随后，我们进行了人体研究，进一步证明BASHEXPLAINER的竞争力。最后，我们开发了一个基于BASHEXPLAINER的浏览器插件，以方便开发人员理解Bash代码。

{"title":"BashExplainer: Retrieval-Augmented Bash Code Comment Generation based on Fine-tuned CodeBERT","authors":"Chi Yu, Guang Yang, Xiang Chen, Ke Liu, Yanlin Zhou","doi":"10.1109/ICSME55016.2022.00016","DOIUrl":"https://doi.org/10.1109/ICSME55016.2022.00016","url":null,"abstract":"Developers use shell commands for many tasks, such as file system management, network control, and process management. Bash is one of the most commonly used shells and plays an important role in Linux system development and maintenance. Due to the language flexibility of Bash code, developers who are not familiar with Bash often have difficulty understanding the purpose and functionality of Bash code. In this study, we study Bash code comment generation problem and proposed an automatic method BASHEXPLAINER based on two-stage training strategy. In the first stage, we train a Bash encoder by fine-tuning CodeBERT on our constructed Bash code corpus. In the second stage, we first retrieve the most similar code from the code repository for the target code based on semantic and lexical similarity. Then we use the trained Bash encoder to generate two vector representations. Finally, we fuse these two vector representations via the fusion layer and generate the code comment through the decoder. To show the competitiveness of our proposed method, we construct a high-quality corpus by combining the corpus shared in the previous NL2Bash study and the corpus shared in the NLC2CMD competition. This corpus contains 10,592 Bash codes and corresponding comments. Then we selected ten baselines from previous studies on automatic code comment generation, which cover information retrieval methods, deep learning methods, and hybrid methods. The experimental results show that in terms of the performance measures BLEU-3/4, METEOR, and ROUGR-L, BASHEXPLAINER can outperform all baselines by at least 8.75%, 9.29%, 4.77% and 3.86%. Then we design ablation experiments to show the component setting rationality of BASHEXPLAINER. Later, we conduct a human study to further show the competitiveness of BASHEXPLAINER. Finally, we develop a browser plug-in based on BASHEXPLAINER to facilitate the understanding of the Bash code for developers.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"1710 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134506439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

AutoPRTitle: A Tool for Automatic Pull Request Title Generation 自动生成拉取请求标题的工具

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Pub Date : 2022-06-23 DOI: 10.1109/ICSME55016.2022.00058

I. Irsan, Ting Zhang, Ferdian Thung, David Lo, Lingxiao Jiang

With the rise of the pull request mechanism in software development, the quality of pull requests has gained more attention. Prior works focus on improving the quality of pull request descriptions and several approaches have been proposed to automatically generate pull request descriptions. As an essential component of a pull request, pull request titles have not received a similar level of attention. To further facilitate automation in software development and to help developers draft high-quality pull request titles, we introduce AutoPRTitle. AutoPRTitle is specifically designed to generate pull request titles automatically. AutoPRTitle can generate a precise and succinct pull request title based on the pull request description, commit messages, and the associated issue titles. AutoPRTitle is built upon a state-of-the-art text summarization model, BART, which has been pre-trained on large-scale English corpora. We further fine-tuned BART in a pull request dataset containing high-quality pull request titles. We implemented AutoPRTitle as a stand-alone web application. We conducted two sets of evaluations: one concerning the model accuracy and the other concerning the tool usability. For model accuracy, BART outperforms the best baseline by 24.6%, 40.5%, and 23.3%, respectively. For tool usability, the evaluators consider our tool as easy-to-use and useful when creating a pull request title of good quality.

随着拉请求机制在软件开发中的兴起，拉请求的质量受到越来越多的关注。先前的工作主要集中在提高拉请求描述的质量，并提出了几种自动生成拉请求描述的方法。作为拉取请求的重要组成部分，拉取请求标题并没有受到类似程度的关注。为了进一步促进软件开发中的自动化，并帮助开发人员起草高质量的pull request标题，我们引入了AutoPRTitle。AutoPRTitle是专门为自动生成pull request标题而设计的。AutoPRTitle可以根据拉取请求描述、提交消息和相关的问题标题生成精确而简洁的拉取请求标题。AutoPRTitle是建立在最先进的文本摘要模型BART之上的，该模型已经在大规模的英语语料库上进行了预训练。我们进一步在包含高质量拉请求标题的拉请求数据集中对BART进行了微调。我们将AutoPRTitle作为一个独立的web应用程序来实现。我们进行了两组评估:一组关于模型的准确性，另一组关于工具的可用性。对于模型精度，BART分别比最佳基线高出24.6%、40.5%和23.3%。对于工具的可用性，评估者认为我们的工具在创建高质量的pull request标题时易于使用和有用。

{"title":"AutoPRTitle: A Tool for Automatic Pull Request Title Generation","authors":"I. Irsan, Ting Zhang, Ferdian Thung, David Lo, Lingxiao Jiang","doi":"10.1109/ICSME55016.2022.00058","DOIUrl":"https://doi.org/10.1109/ICSME55016.2022.00058","url":null,"abstract":"With the rise of the pull request mechanism in software development, the quality of pull requests has gained more attention. Prior works focus on improving the quality of pull request descriptions and several approaches have been proposed to automatically generate pull request descriptions. As an essential component of a pull request, pull request titles have not received a similar level of attention. To further facilitate automation in software development and to help developers draft high-quality pull request titles, we introduce AutoPRTitle. AutoPRTitle is specifically designed to generate pull request titles automatically. AutoPRTitle can generate a precise and succinct pull request title based on the pull request description, commit messages, and the associated issue titles. AutoPRTitle is built upon a state-of-the-art text summarization model, BART, which has been pre-trained on large-scale English corpora. We further fine-tuned BART in a pull request dataset containing high-quality pull request titles. We implemented AutoPRTitle as a stand-alone web application. We conducted two sets of evaluations: one concerning the model accuracy and the other concerning the tool usability. For model accuracy, BART outperforms the best baseline by 24.6%, 40.5%, and 23.3%, respectively. For tool usability, the evaluators consider our tool as easy-to-use and useful when creating a pull request title of good quality.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126828103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

There Ain’t No Such Thing as a Free Custom Memory Allocator 没有什么免费的自定义内存分配器

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Pub Date : 2022-06-23 DOI: 10.1109/ICSME55016.2022.00079

Gunnar Kudrjavets, Jeff Thomas, Aditya Kumar, Nachiappan Nagappan, Ayushi Rastogi

Using custom memory allocators is an efficient performance optimization technique. However, dependency on a custom allocator can introduce several maintenance-related issues. We present lessons learned from the industry and provide critical guidance for using custom memory allocators and enumerate various challenges associated with integrating them. These recommendations are based on years of experience incorporating custom allocators into different industrial software projects.

使用自定义内存分配器是一种高效的性能优化技术。但是，对自定义分配器的依赖可能会引入几个与维护相关的问题。我们介绍了从业界吸取的经验教训，提供了使用自定义内存分配器的关键指导，并列举了与集成它们相关的各种挑战。这些建议是基于将定制分配器集成到不同的工业软件项目中的多年经验。

引用次数: 0

Automatic Pull Request Title Generation 自动拉请求标题生成

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Pub Date : 2022-06-21 DOI: 10.1109/ICSME55016.2022.00015

Ting Zhang, I. Irsan, Ferdian Thung, Donggyun Han, David Lo, Lingxiao Jiang

Pull Requests (PRs) are a mechanism on modern collaborative coding platforms, such as GitHub. PRs allow developers to tell others that their code changes are available for merging into another branch in a repository. A PR needs to be reviewed and approved by the core team of the repository before the changes are merged into the branch. Usually, reviewers need to identify a PR that is in line with their interests before providing a review. By default, PRs are arranged in a list view that shows the titles of PRs. Therefore, it is desirable to have a precise and concise title, which is beneficial for both reviewers and other developers. However, it is often the case that developers do not provide good titles; we find that many existing PR titles are either inappropriate in length (i.e., too short or too long) or fail to convey useful information, which may result in PR being ignored or rejected. Therefore, there is a need for automatic techniques to help developers draft high-quality titles.In this paper, we introduce the task of automatic generation of PR titles. We formulate the task as a one-sentence summarization task. To facilitate the research on this task, we construct a dataset that consists of 43,816 PRs from 495 GitHub repositories. We evaluated the state-of-the-art summarization approaches for the automatic PR title generation task. We leverage ROUGE metrics to automatically evaluate the summarization approaches and conduct a manual evaluation. The experimental results indicate that BART is the best technique for generating satisfactory PR titles with ROUGE-1, ROUGE-2, and ROUGE-L F1-scores of 47.22, 25.27, and 43.12, respectively. The manual evaluation also shows that the titles generated by BART are preferred.

Pull Requests (pr)是现代协作编码平台(如GitHub)上的一种机制。pr允许开发人员告诉其他人，他们的代码更改可以合并到存储库中的另一个分支中。在将更改合并到分支之前，PR需要由存储库的核心团队进行审查和批准。通常，审阅者需要在提供审阅之前确定符合他们兴趣的PR。默认情况下，pr被安排在显示pr标题的列表视图中。因此，我们希望拥有一个精确而简洁的标题，这对评论者和其他开发者都是有益的。然而，通常情况下开发者并没有提供好游戏;我们发现许多现有的PR标题要么长度不合适(即太短或太长)，要么不能传达有用的信息，这可能导致PR被忽视或拒绝。因此，有必要使用自动化技术来帮助开发者起草高质量的游戏。在本文中，我们介绍了自动生成公关标题的任务。我们将这个任务定义为一句话总结任务。为了便于这项任务的研究，我们构建了一个由495个GitHub存储库中的43,816个pr组成的数据集。我们评估了自动公关标题生成任务的最先进的摘要方法。我们利用ROUGE度量来自动评估总结方法，并进行手动评估。实验结果表明，BART是生成满意PR标题的最佳技术，ROUGE-1、ROUGE-2和ROUGE-L f1得分分别为47.22、25.27和43.12。手工评估也表明BART生成的标题是首选的。

{"title":"Automatic Pull Request Title Generation","authors":"Ting Zhang, I. Irsan, Ferdian Thung, Donggyun Han, David Lo, Lingxiao Jiang","doi":"10.1109/ICSME55016.2022.00015","DOIUrl":"https://doi.org/10.1109/ICSME55016.2022.00015","url":null,"abstract":"Pull Requests (PRs) are a mechanism on modern collaborative coding platforms, such as GitHub. PRs allow developers to tell others that their code changes are available for merging into another branch in a repository. A PR needs to be reviewed and approved by the core team of the repository before the changes are merged into the branch. Usually, reviewers need to identify a PR that is in line with their interests before providing a review. By default, PRs are arranged in a list view that shows the titles of PRs. Therefore, it is desirable to have a precise and concise title, which is beneficial for both reviewers and other developers. However, it is often the case that developers do not provide good titles; we find that many existing PR titles are either inappropriate in length (i.e., too short or too long) or fail to convey useful information, which may result in PR being ignored or rejected. Therefore, there is a need for automatic techniques to help developers draft high-quality titles.In this paper, we introduce the task of automatic generation of PR titles. We formulate the task as a one-sentence summarization task. To facilitate the research on this task, we construct a dataset that consists of 43,816 PRs from 495 GitHub repositories. We evaluated the state-of-the-art summarization approaches for the automatic PR title generation task. We leverage ROUGE metrics to automatically evaluate the summarization approaches and conduct a manual evaluation. The experimental results indicate that BART is the best technique for generating satisfactory PR titles with ROUGE-1, ROUGE-2, and ROUGE-L F1-scores of 47.22, 25.27, and 43.12, respectively. The manual evaluation also shows that the titles generated by BART are preferred.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128853493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

CATTO: Just-in-time Test Case Selection and Execution CATTO:及时的测试用例选择和执行

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Pub Date : 2022-06-17 DOI: 10.1109/ICSME55016.2022.00059

Dario Amoroso d'Aragona, Fabiano Pecorelli, Simone Romano, G. Scanniello, M. T. Baldassarre, Andrea Janes, Valentina Lenarduzzi

Regression testing wants to prevent that errors, which have already been corrected once, creep back into a system that has been updated. A naïve approach consists of re-running the entire test suite (TS) against the changed version of the software under test (SUT). However, this might result in a time-and resource-consuming process; e.g., when dealing with large and/or complex SUTs and TSs. To avoid this problem, Test Case Selection (TCS) approaches can be used. This kind of approaches build a temporary TS comprising only those test cases (TCs) that are relevant to the changes made to the SUT, so avoiding executing unnecessary TCs. In this paper, we introduce CATTO (Commit Adaptive Tool for Test suite Optimization), a tool implementing a TCS strategy for SUTs written in Java as well as a wrapper to allow developers to use CATTO within IntelliJ IDEA and to execute CATTO just-in-time before committing changes to the repository. We conducted a preliminary evaluation of CATTO on seven open-source Java projects to evaluate the reduction of the test-suite size, the loss of fault-revealing TCs, and the loss of fault-detection capability. The results suggest that CATTO can be of help to developers when performing TCS. The video demo and the documentation of the tool is available at: https://catto-tool.github.io/

回归测试想要防止已经被纠正过一次的错误再次蔓延到已经更新过的系统中。naïve方法包括针对被测软件(SUT)的更改版本重新运行整个测试套件(TS)。然而，这可能会导致耗费时间和资源的过程;例如，当处理大型和/或复杂的sut和TSs时。为了避免这个问题，可以使用测试用例选择(TCS)方法。这种方法构建了一个临时的TS，它只包含那些与SUT变更相关的测试用例(tc)，从而避免执行不必要的tc。在本文中，我们介绍了CATTO(提交自适应测试套件优化工具)，这是一个为Java编写的测试套件实现TCS策略的工具，也是一个包装器，允许开发人员在IntelliJ IDEA中使用CATTO，并在向存储库提交更改之前及时执行CATTO。我们对七个开源Java项目进行了CATTO的初步评估，以评估测试套件大小的减少、故障显示tc的损失以及故障检测能力的损失。结果表明，CATTO可以帮助开发人员执行TCS。该工具的视频演示和文档可在:https://catto-tool.github.io/上获得

{"title":"CATTO: Just-in-time Test Case Selection and Execution","authors":"Dario Amoroso d'Aragona, Fabiano Pecorelli, Simone Romano, G. Scanniello, M. T. Baldassarre, Andrea Janes, Valentina Lenarduzzi","doi":"10.1109/ICSME55016.2022.00059","DOIUrl":"https://doi.org/10.1109/ICSME55016.2022.00059","url":null,"abstract":"Regression testing wants to prevent that errors, which have already been corrected once, creep back into a system that has been updated. A naïve approach consists of re-running the entire test suite (TS) against the changed version of the software under test (SUT). However, this might result in a time-and resource-consuming process; e.g., when dealing with large and/or complex SUTs and TSs. To avoid this problem, Test Case Selection (TCS) approaches can be used. This kind of approaches build a temporary TS comprising only those test cases (TCs) that are relevant to the changes made to the SUT, so avoiding executing unnecessary TCs. In this paper, we introduce CATTO (Commit Adaptive Tool for Test suite Optimization), a tool implementing a TCS strategy for SUTs written in Java as well as a wrapper to allow developers to use CATTO within IntelliJ IDEA and to execute CATTO just-in-time before committing changes to the repository. We conducted a preliminary evaluation of CATTO on seven open-source Java projects to evaluate the reduction of the test-suite size, the loss of fault-revealing TCs, and the loss of fault-detection capability. The results suggest that CATTO can be of help to developers when performing TCS. The video demo and the documentation of the tool is available at: https://catto-tool.github.io/","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126151606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

OpenCBS: An Open-Source COBOL Defects Benchmark Suite 开源COBOL缺陷基准测试套件

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Pub Date : 2022-06-13 DOI: 10.1109/ICSME55016.2022.00030

Dylan T. Lee, Austin Z. Henley, B. Hinshaw, Rahul Pandita

As the current COBOL workforce retires, entry-level developers are left to keep complex legacy systems maintained and operational. This creates a massive gap in knowledge and ability as companies are having their veteran developers replaced with a new, inexperienced workforce. Additionally, the lack of COBOL and mainframe technology in the current academic curriculum further increases the learning curve for this new generation of developers. These issues are becoming even more pressing due to the business-critical nature of these systems, which makes migrating or replacing the mainframe and COBOL unlikely anytime soon. As a result, there is now a huge need for tools and resources to increase new developers’ code comprehension and ability to perform routine tasks such as debugging and defect location. Extensive work has been done in the software engineering field on the creation of such resources. However, the proprietary nature of COBOL and mainframe systems has restricted the amount of work and the number of open-source tools available for this domain. To address this issue, our work leverages the publicly available technical forum data to build an open-source collection of COBOL programs embodying issues/defects faced by COBOL developers. These programs were reconstructed and organized in a benchmark suite to facilitate the testing of developer tools. Our goal is to provide an open-source COBOL benchmark and testing suite that encourage community contribution and serve as a resource for researchers and tool-smiths in this domain.

随着当前COBOL工作人员的退休，入门级开发人员将负责维护和操作复杂的遗留系统。这就造成了知识和能力上的巨大差距，因为公司的资深开发人员被缺乏经验的新员工所取代。此外，当前学术课程中缺乏COBOL和大型机技术，这进一步增加了新一代开发人员的学习曲线。由于这些系统的业务关键性质，这些问题变得更加紧迫，这使得迁移或替换大型机和COBOL在短期内不太可能。因此，现在非常需要工具和资源来提高新开发人员的代码理解能力和执行常规任务(如调试和缺陷定位)的能力。在创建这样的资源方面，软件工程领域已经做了大量的工作。然而，COBOL和大型机系统的专有性质限制了这个领域可用的工作量和开源工具的数量。为了解决这个问题，我们的工作利用了公开可用的技术论坛数据来构建一个包含COBOL开发人员面临的问题/缺陷的COBOL程序的开源集合。这些程序被重构并组织在一个基准测试套件中，以促进开发人员工具的测试。我们的目标是提供一个开源的COBOL基准和测试套件，以鼓励社区的贡献，并作为该领域的研究人员和工具匠的资源。

{"title":"OpenCBS: An Open-Source COBOL Defects Benchmark Suite","authors":"Dylan T. Lee, Austin Z. Henley, B. Hinshaw, Rahul Pandita","doi":"10.1109/ICSME55016.2022.00030","DOIUrl":"https://doi.org/10.1109/ICSME55016.2022.00030","url":null,"abstract":"As the current COBOL workforce retires, entry-level developers are left to keep complex legacy systems maintained and operational. This creates a massive gap in knowledge and ability as companies are having their veteran developers replaced with a new, inexperienced workforce. Additionally, the lack of COBOL and mainframe technology in the current academic curriculum further increases the learning curve for this new generation of developers. These issues are becoming even more pressing due to the business-critical nature of these systems, which makes migrating or replacing the mainframe and COBOL unlikely anytime soon. As a result, there is now a huge need for tools and resources to increase new developers’ code comprehension and ability to perform routine tasks such as debugging and defect location. Extensive work has been done in the software engineering field on the creation of such resources. However, the proprietary nature of COBOL and mainframe systems has restricted the amount of work and the number of open-source tools available for this domain. To address this issue, our work leverages the publicly available technical forum data to build an open-source collection of COBOL programs embodying issues/defects faced by COBOL developers. These programs were reconstructed and organized in a benchmark suite to facilitate the testing of developer tools. Our goal is to provide an open-source COBOL benchmark and testing suite that encourage community contribution and serve as a resource for researchers and tool-smiths in this domain.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133717160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Is Kernel Code Different From Non-Kernel Code? A Case Study of BSD Family Operating Systems 内核代码与非内核代码不同吗?BSD家族操作系统的案例研究

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Pub Date : 2022-06-11 DOI: 10.1109/ICSME55016.2022.00027

Gunnar Kudrjavets, Jeff Thomas, Nachiappan Nagappan, Ayushi Rastogi

Studies on software evolution explore code churn and code velocity at the abstraction level of a company or an entire project. We argue that this approach misses the differences among abstractions layers and subsystems of large projects. We conduct a case study on four BSD family operating systems: DragonFlyBSD, FreeBSD, NetBSD, and OpenBSD, to investigate the evolution of code churn and code velocity across kernel and non-kernel code. We mine commits for characteristics such as annual growth rate, commit types, change type ratio, and size taxonomy, indicating code churn. Likewise, we investigate code velocity in terms of code review periods, i.e., time-to-first-response, time-to-accept, and time-to-merge.Our study provides evidence that software evolves differently at abstraction layers: kernel and non-kernel. The study finds similarities in the code base growth rate and distribution of commit types (neutral, additive, and subtractive) across BSD subsystems, however, (a) most commits contain either kernel or non-kernel code, (b) kernel commits are larger than non-kernel commits, and (c) code reviews for kernel code take longer than non-kernel code.

软件进化研究在公司或整个项目的抽象层次上探索代码的混乱和代码的速度。我们认为这种方法忽略了大型项目的抽象层和子系统之间的差异。我们对四个BSD家族操作系统(DragonFlyBSD、FreeBSD、NetBSD和OpenBSD)进行了一个案例研究，以研究跨内核和非内核代码的代码混乱和代码速度的演变。我们挖掘提交的特征，如年增长率、提交类型、变更类型比率和大小分类，这些特征表明代码的变动。同样，我们根据代码审查周期来研究代码速度，即，第一次响应时间，接受时间，和合并时间。我们的研究证明了软件在抽象层(内核层和非内核层)的演化是不同的。研究发现，BSD子系统之间的代码库增长率和提交类型(中性、加法和减法)的分布相似，然而，(a)大多数提交包含内核或非内核代码，(b)内核提交比非内核提交大，(c)内核代码的代码审查比非内核代码花的时间更长。

{"title":"Is Kernel Code Different From Non-Kernel Code? A Case Study of BSD Family Operating Systems","authors":"Gunnar Kudrjavets, Jeff Thomas, Nachiappan Nagappan, Ayushi Rastogi","doi":"10.1109/ICSME55016.2022.00027","DOIUrl":"https://doi.org/10.1109/ICSME55016.2022.00027","url":null,"abstract":"Studies on software evolution explore code churn and code velocity at the abstraction level of a company or an entire project. We argue that this approach misses the differences among abstractions layers and subsystems of large projects. We conduct a case study on four BSD family operating systems: DragonFlyBSD, FreeBSD, NetBSD, and OpenBSD, to investigate the evolution of code churn and code velocity across kernel and non-kernel code. We mine commits for characteristics such as annual growth rate, commit types, change type ratio, and size taxonomy, indicating code churn. Likewise, we investigate code velocity in terms of code review periods, i.e., time-to-first-response, time-to-accept, and time-to-merge.Our study provides evidence that software evolves differently at abstraction layers: kernel and non-kernel. The study finds similarities in the code base growth rate and distribution of commit types (neutral, additive, and subtractive) across BSD subsystems, however, (a) most commits contain either kernel or non-kernel code, (b) kernel commits are larger than non-kernel commits, and (c) code reviews for kernel code take longer than non-kernel code.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129928097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0