Science of Computer Programming最新文献_第7页

Prescriptive procedure for manual code smell annotation 人工标注代码气味的规定程序

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Science of Computer Programming

Pub Date : 2024-06-26 DOI: 10.1016/j.scico.2024.103168

Simona Prokić, Nikola Luburić, Jelena Slivka, Aleksandar Kovačević

– Code smells are structures in code that present potential software maintainability issues. Manually constructing high-quality datasets to train ML models for code smell detection is challenging. Inconsistent annotations, small size, non-realistic smell-to-non-smell ratio, and poor smell coverage hinder the dataset quality. These issues arise mainly due to the time-consuming nature of manual annotation and annotators’ disagreements caused by ambiguous and vague smell definitions.

To address challenges related to building high-quality datasets suitable for training ML models for smell detection, we designed a prescriptive procedure for manual code smell annotation. The proposed procedure represents an extension of our previous work, aiming to support the annotation of any smell defined by Fowler. We validated the procedure by employing three annotators to annotate smells following the proposed annotation procedure.

The main contribution of this paper is a prescriptive annotation procedure that benefits the following stakeholders: annotators building high-quality smell datasets that can be used to train ML models, ML researchers building ML models for smell detection, and software engineers employing ML models to enhance the software maintainability. Secondary contributions are the code smell dataset containing Data Class, Feature Envy, and Refused Bequest, and DataSet Explorer tool which supports annotators during the annotation procedure.

- 代码气味是指代码中存在潜在软件可维护性问题的结构。手动构建高质量数据集来训练用于代码气味检测的 ML 模型是一项挑战。不一致的注释、较小的规模、非现实的气味与非气味比例以及较低的气味覆盖率都会影响数据集的质量。这些问题的出现主要是由于人工注释耗时以及注释者因气味定义模糊不清而产生分歧。为了解决与构建适合训练气味检测 ML 模型的高质量数据集相关的挑战，我们设计了一种用于人工代码气味注释的规范程序。所提出的程序是我们之前工作的延伸，旨在支持对 Fowler 定义的任何气味进行注释。本文的主要贡献是提出了一种规范性注释程序，它能为以下利益相关者带来益处：注释者建立了可用于训练 ML 模型的高质量气味数据集；ML 研究人员建立了用于气味检测的 ML 模型；软件工程师使用 ML 模型提高了软件的可维护性。该系统的主要贡献是包含数据类、特征嫉妒和拒绝请求的代码气味数据集，以及在注释过程中为注释者提供支持的数据集资源管理器工具。

{"title":"Prescriptive procedure for manual code smell annotation","authors":"Simona Prokić, Nikola Luburić, Jelena Slivka, Aleksandar Kovačević","doi":"10.1016/j.scico.2024.103168","DOIUrl":"https://doi.org/10.1016/j.scico.2024.103168","url":null,"abstract":"<div><p>– Code smells are structures in code that present potential software maintainability issues. Manually constructing high-quality datasets to train ML models for code smell detection is challenging. Inconsistent annotations, small size, non-realistic smell-to-non-smell ratio, and poor smell coverage hinder the dataset quality. These issues arise mainly due to the time-consuming nature of manual annotation and annotators’ disagreements caused by ambiguous and vague smell definitions.</p><p>To address challenges related to building high-quality datasets suitable for training ML models for smell detection, we designed a prescriptive procedure for manual code smell annotation. The proposed procedure represents an extension of our previous work, aiming to support the annotation of any smell defined by Fowler. We validated the procedure by employing three annotators to annotate smells following the proposed annotation procedure.</p><p>The main contribution of this paper is a prescriptive annotation procedure that benefits the following stakeholders: annotators building high-quality smell datasets that can be used to train ML models, ML researchers building ML models for smell detection, and software engineers employing ML models to enhance the software maintainability. Secondary contributions are the code smell dataset containing Data Class, Feature Envy, and Refused Bequest, and DataSet Explorer tool which supports annotators during the annotation procedure.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"238 ","pages":"Article 103168"},"PeriodicalIF":1.5,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141541063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GraphPyRec: A novel graph-based approach for fine-grained Python code recommendation GraphPyRec：基于图的新颖方法：细粒度 Python 代码推荐

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Science of Computer Programming

Pub Date : 2024-06-18 DOI: 10.1016/j.scico.2024.103166

Xing Zong, Shang Zheng, Haitao Zou, Hualong Yu, Shang Gao

Artificial intelligence has been widely applied in software engineering areas such as code recommendation. Significant progress has been made in code recommendation for static languages in recent years, but it remains challenging for dynamic languages like Python as accurately determining data flows before runtime is difficult. This limitation hinders data flow analysis, affecting the performance of code recommendation methods that rely on code analysis. In this study, a graph-based Python recommendation approach (GraphPyRec) is proposed by converting source code into a graph representation that captures both semantic and dynamic information. Nodes represent semantic information, with unique rules defined for various code statements. Edges depict control flow and data flow, utilizing a child-sibling-like process and a dedicated algorithm for data transfer extraction. Alongside the graph, a bag of words is created to include essential names, and a pre-trained BERT model transforms it into vectors. These vectors are integrated into a Gated Graph Neural Network (GGNN) process of the code recommendation model, enhancing its effectiveness and accuracy. To validate the proposed method, we crawled over a million lines of code from GitHub. Experimental results show that GraphPyRec outperforms existing mainstream Python code recommendation methods, achieving Top-1, 5, and 10 accuracy rates of 68.52%, 88.92%, and 94.05%, respectively, along with a Mean Reciprocal Rank (MRR) of 0.772.

人工智能已被广泛应用于代码推荐等软件工程领域。近年来，静态语言的代码推荐取得了长足进步，但对于 Python 这样的动态语言来说，准确确定运行前的数据流仍是一项挑战。这一限制阻碍了数据流分析，影响了依赖代码分析的代码推荐方法的性能。本研究提出了一种基于图的 Python 推荐方法 (GraphPyRec)，它将源代码转换为一种能捕捉语义和动态信息的图表示法。节点代表语义信息，为各种代码语句定义了独特的规则。边表示控制流和数据流，利用类似同胞兄弟的流程和专用算法进行数据传输提取。除图形外，还创建了一个包含重要名称的词袋，并由预先训练好的 BERT 模型将其转换为向量。这些向量被集成到代码推荐模型的门控图神经网络（GGNN）过程中，从而提高了其有效性和准确性。为了验证所提出的方法，我们从 GitHub 抓取了超过一百万行代码。实验结果表明，GraphPyRec 优于现有的主流 Python 代码推荐方法，Top-1、5 和 10 的准确率分别为 68.52%、88.92% 和 94.05%，平均互易排名 (MRR) 为 0.772。

{"title":"GraphPyRec: A novel graph-based approach for fine-grained Python code recommendation","authors":"Xing Zong, Shang Zheng, Haitao Zou, Hualong Yu, Shang Gao","doi":"10.1016/j.scico.2024.103166","DOIUrl":"https://doi.org/10.1016/j.scico.2024.103166","url":null,"abstract":"<div><p>Artificial intelligence has been widely applied in software engineering areas such as code recommendation. Significant progress has been made in code recommendation for static languages in recent years, but it remains challenging for dynamic languages like Python as accurately determining data flows before runtime is difficult. This limitation hinders data flow analysis, affecting the performance of code recommendation methods that rely on code analysis. In this study, a graph-based Python recommendation approach (GraphPyRec) is proposed by converting source code into a graph representation that captures both semantic and dynamic information. Nodes represent semantic information, with unique rules defined for various code statements. Edges depict control flow and data flow, utilizing a child-sibling-like process and a dedicated algorithm for data transfer extraction. Alongside the graph, a bag of words is created to include essential names, and a pre-trained BERT model transforms it into vectors. These vectors are integrated into a Gated Graph Neural Network (GGNN) process of the code recommendation model, enhancing its effectiveness and accuracy. To validate the proposed method, we crawled over a million lines of code from GitHub. Experimental results show that GraphPyRec outperforms existing mainstream Python code recommendation methods, achieving Top-1, 5, and 10 accuracy rates of 68.52%, 88.92%, and 94.05%, respectively, along with a Mean Reciprocal Rank (MRR) of 0.772.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"238 ","pages":"Article 103166"},"PeriodicalIF":1.5,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141487280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Special Issue on Selected Tools from the Tool Track of the 30th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2023 Tool Track) 第 30 届电气和电子工程师学会软件分析、进化与再工程国际会议工具专题特刊（SANER 2023 工具专题）

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Science of Computer Programming

Pub Date : 2024-06-14 DOI: 10.1016/j.scico.2024.103167

Ying Wang , Tao Zhang , Xiapu Luo , Peng Liang

引用次数: 0

libmg: A Python library for programming graph neural networks in μG libmg: μG 图形神经网络编程 Python 库

IF 1.3 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Science of Computer Programming

Pub Date : 2024-06-14 DOI: 10.1016/j.scico.2024.103165

Matteo Belenchia, Flavio Corradini, Michela Quadrini, Michele Loreti

Graph neural networks have proven their effectiveness across a wide spectrum of graph-based tasks. Despite their successes, they share the same limitations as other deep learning architectures and pose additional challenges for their formal verification. To overcome these problems, we proposed a specification language, $μ G$ , that can be used to program graph neural networks. This language has been implemented in a Python library called libmg that handles the definition, compilation, visualization, and explanation of $μ G$ graph neural network models. We illustrate its usage by showing how it was used to implement a Computation Tree Logic model checker in our previous work, and evaluate its performance on the benchmarks of the Model Checking Contest. In the future, we plan to use $μ G$ to further investigate the issues of explainability and verification of graph neural networks.

图神经网络已经证明了其在各种基于图的任务中的有效性。尽管取得了成功，但它们与其他深度学习架构一样存在局限性，并为其形式验证带来了额外的挑战。为了克服这些问题，我们提出了一种可用于图神经网络编程的规范语言 μG。这种语言已在一个名为 libmg 的 Python 库中实现，该库可处理 μG 图神经网络模型的定义、编译、可视化和解释。我们在之前的工作中使用该语言实现了计算树逻辑模型检查器，并在模型检查竞赛的基准测试中对其性能进行了评估，以此说明该语言的用法。未来，我们计划使用 μG 进一步研究图神经网络的可解释性和验证问题。

引用次数: 0

Towards a framework for reliable performance evaluation in defect prediction 建立可靠的缺陷预测性能评估框架

IF 1.3 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Science of Computer Programming

Pub Date : 2024-06-12 DOI: 10.1016/j.scico.2024.103164

Xutong Liu, Shiran Liu, Zhaoqiang Guo, Peng Zhang, Yibiao Yang, Huihui Liu, Hongmin Lu, Yanhui Li, Lin Chen, Yuming Zhou

Enhancing software reliability, dependability, and security requires effective identification and mitigation of defects during early development stages. Software defect prediction (SDP) models have emerged as valuable tools for this purpose. However, there is currently a lack of consensus in evaluating the predictive performance of newly proposed models, which hinders accurate measurement of progress and can lead to misleading conclusions. To tackle this challenge, we present MATTER (a fraMework towArd a consisTenT pErformance compaRison), which aims to provide reliable and consistent performance comparisons for SDP models. MATTER incorporates three key considerations. First, it establishes a global reference point, ONE (glObal baseliNe modEl), which possesses the 3S properties (Simplicity in implementation, Strong predictive ability, and Stable prediction performance), to serve as the baseline for evaluating other models. Second, it proposes using the SQA-effort-aligned threshold setting to ensure fair performance comparisons. Third, it advocates for consistent performance evaluation by adopting a set of core performance indicators that reflect the practical value of prediction models in achieving tangible progress. Through the application of MATTER to the same benchmark data sets, researchers and practitioners can obtain more accurate and meaningful insights into the performance of defect prediction models, thereby facilitating informed decision-making and improving software quality. When evaluating representative SDP models from recent years using MATTER, we surprisingly observed that: none of these models demonstrated a notable enhancement in prediction performance compared to the simple baseline model ONE. In future studies, we strongly recommend the adoption of MATTER to assess the actual usefulness of newly proposed models, promoting reliable scientific progress in defect prediction.

要提高软件的可靠性、可靠性和安全性，就必须在早期开发阶段有效地识别和减少缺陷。为此，软件缺陷预测（SDP）模型已成为有价值的工具。然而，目前在评估新提出模型的预测性能方面还缺乏共识，这阻碍了对进展的精确测量，并可能导致误导性结论。为了应对这一挑战，我们提出了 MATTER（一种旨在实现一致性能比较的方法），旨在为 SDP 模型提供可靠、一致的性能比较。MATTER 考虑了三个关键因素。首先，它建立了一个具有 3S 特性（实施简单、预测能力强、预测性能稳定）的全球参考点 ONE（全球基准模型），作为评估其他模型的基准。其次，它建议使用 SQA 算法对齐阈值设置，以确保公平的性能比较。第三，它主张通过采用一套核心性能指标来实现一致的性能评估，这些指标反映了预测模型在取得切实进展方面的实用价值。通过将 MATTER 应用于相同的基准数据集，研究人员和从业人员可以更准确、更有意义地了解缺陷预测模型的性能，从而促进知情决策，提高软件质量。在使用 MATTER 评估近年来具有代表性的 SDP 模型时，我们惊讶地发现：与简单的基线模型 ONE 相比，这些模型的预测性能都没有明显提高。在今后的研究中，我们强烈建议采用 MATTER 评估新提出模型的实际效用，以促进缺陷预测领域可靠的科学进步。

{"title":"Towards a framework for reliable performance evaluation in defect prediction","authors":"Xutong Liu, Shiran Liu, Zhaoqiang Guo, Peng Zhang, Yibiao Yang, Huihui Liu, Hongmin Lu, Yanhui Li, Lin Chen, Yuming Zhou","doi":"10.1016/j.scico.2024.103164","DOIUrl":"10.1016/j.scico.2024.103164","url":null,"abstract":"<div><p>Enhancing software reliability, dependability, and security requires effective identification and mitigation of defects during early development stages. Software defect prediction (SDP) models have emerged as valuable tools for this purpose. However, there is currently a lack of consensus in evaluating the predictive performance of newly proposed models, which hinders accurate measurement of progress and can lead to misleading conclusions. To tackle this challenge, we present MATTER (a fraMework towArd a consisTenT pErformance compaRison), which aims to provide reliable and consistent performance comparisons for SDP models. MATTER incorporates three key considerations. First, it establishes a global reference point, ONE (glObal baseliNe modEl), which possesses the 3S properties (Simplicity in implementation, Strong predictive ability, and Stable prediction performance), to serve as the baseline for evaluating other models. Second, it proposes using the SQA-effort-aligned threshold setting to ensure fair performance comparisons. Third, it advocates for consistent performance evaluation by adopting a set of core performance indicators that reflect the practical value of prediction models in achieving tangible progress. Through the application of MATTER to the same benchmark data sets, researchers and practitioners can obtain more accurate and meaningful insights into the performance of defect prediction models, thereby facilitating informed decision-making and improving software quality. When evaluating representative SDP models from recent years using MATTER, we surprisingly observed that: none of these models demonstrated a notable enhancement in prediction performance compared to the simple baseline model ONE. In future studies, we strongly recommend the adoption of MATTER to assess the actual usefulness of newly proposed models, promoting reliable scientific progress in defect prediction.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"238 ","pages":"Article 103164"},"PeriodicalIF":1.3,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141408043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TR-Fuzz: A syntax valid tool for fuzzing C compilers TR-Fuzz：用于模糊 C 编译器的语法有效工具

IF 1.3 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Science of Computer Programming

Pub Date : 2024-06-07 DOI: 10.1016/j.scico.2024.103155

Chi Zhang , Jinfu Chen , Saihua Cai , Wen Zhang , Rexford Nii Ayitey Sosu , Haibo Chen

Compilers play a critical role in current software construction. However, the vulnerabilities or bugs within the compiler can pose significant challenges to ensuring the security of the resultant software. In recent years, many compilers have made use of testing techniques to address and mitigate such concerns. Fuzzing is widely used among these techniques to detect software bugs. However, when fuzzing compilers, there are still shortcomings in terms of the diversity and validity of test cases. This paper introduces TR-Fuzz, a fuzzing tool specifically designed for C compilers based on Transformer. Leveraging position embedding and multi-head attention mechanisms, TR-Fuzz establishes relationships among data, facilitating the generation of well-formed C programs for compiler testing. In addition, we use different generation strategies in the process of program generation to improve the performance of TR-Fuzz. We validate the effectiveness of TR-Fuzz through the comparison with existing fuzzing tools for C compilers. The experimental results show that TR-Fuzz increases the pass rate of the generated C programs by an average of about 12% and improves the coverage of programs under test compared with the existing tools. Benefiting from the improved pass rate and coverage, we found five bugs in GCC-9.

编译器在当前的软件开发中发挥着至关重要的作用。然而，编译器中的漏洞或错误会给确保生成软件的安全性带来巨大挑战。近年来，许多编译器都采用了测试技术来解决和减轻这些问题。在这些技术中，模糊技术被广泛用于检测软件缺陷。然而，在对编译器进行模糊测试时，测试用例的多样性和有效性仍存在不足。本文介绍了 TR-Fuzz，一种专门为基于 Transformer 的 C 语言编译器设计的模糊工具。利用位置嵌入和多头关注机制，TR-Fuzz 建立了数据之间的关系，从而为编译器测试生成格式良好的 C 程序提供了便利。此外，我们还在程序生成过程中使用不同的生成策略，以提高 TR-Fuzz 的性能。我们通过与现有 C 编译器模糊工具的比较，验证了 TR-Fuzz 的有效性。实验结果表明，与现有工具相比，TR-Fuzz 将生成的 C 程序的通过率平均提高了约 12%，并提高了被测程序的覆盖率。得益于通过率和覆盖率的提高，我们在 GCC-9 中发现了五个错误。

{"title":"TR-Fuzz: A syntax valid tool for fuzzing C compilers","authors":"Chi Zhang , Jinfu Chen , Saihua Cai , Wen Zhang , Rexford Nii Ayitey Sosu , Haibo Chen","doi":"10.1016/j.scico.2024.103155","DOIUrl":"10.1016/j.scico.2024.103155","url":null,"abstract":"<div><p>Compilers play a critical role in current software construction. However, the vulnerabilities or bugs within the compiler can pose significant challenges to ensuring the security of the resultant software. In recent years, many compilers have made use of testing techniques to address and mitigate such concerns. Fuzzing is widely used among these techniques to detect software bugs. However, when fuzzing compilers, there are still shortcomings in terms of the diversity and validity of test cases. This paper introduces TR-Fuzz, a fuzzing tool specifically designed for C compilers based on Transformer. Leveraging position embedding and multi-head attention mechanisms, TR-Fuzz establishes relationships among data, facilitating the generation of well-formed C programs for compiler testing. In addition, we use different generation strategies in the process of program generation to improve the performance of TR-Fuzz. We validate the effectiveness of TR-Fuzz through the comparison with existing fuzzing tools for C compilers. The experimental results show that TR-Fuzz increases the pass rate of the generated C programs by an average of about 12% and improves the coverage of programs under test compared with the existing tools. Benefiting from the improved pass rate and coverage, we found five bugs in GCC-9.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"238 ","pages":"Article 103155"},"PeriodicalIF":1.3,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141405384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Latch: Enabling large-scale automated testing on constrained systems Latch：实现受限系统的大规模自动测试

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Science of Computer Programming

Pub Date : 2024-06-06 DOI: 10.1016/j.scico.2024.103157

Tom Lauwaerts , Stefan Marr , Christophe Scholliers

Testing is an essential part of the software development cycle. Unfortunately, testing on constrained devices is currently very challenging. First, the limited memory of constrained devices severely restricts the size of test suites. Second, the limited processing power causes test suites to execute slowly, preventing a fast feedback loop. Third, when the constrained device becomes unresponsive, it is impossible to distinguish between the test failing or taking very long, forcing the developer to work with timeouts. Unfortunately, timeouts can cause tests to be flaky, i.e., have unpredictable outcomes independent of code changes. Given these problems, most IoT developers rely on laborious manual testing.

In this paper, we propose the novel testing framework Latch (Large-scale Automated Testing on Constrained Hardware) to overcome the three main challenges of running large test suites on constrained hardware, as well as automate manual testing scenarios through a novel testing methodology based on debugger-like operations—we call this new testing approach managed testing.

The core idea of Latch is to enable testing on constrained devices without those devices maintaining the whole test suite in memory. Therefore, programmers script and run tests on a workstation which then step-wise instructs the constrained device to execute each test, thereby overcoming the memory constraints. Our testing framework further allows developers to mark tests as depending on other tests. This way, Latch can skip tests that depend on previously failing tests resulting in a faster feedback loop. Finally, Latch addresses the issue of timeouts and flaky tests by including an analysis mode that provides feedback on timeouts and the flakiness of tests.

To illustrate the expressiveness of Latch, we present testing scenarios representing unit testing, integration testing, and end-to-end testing. We evaluate the performance of Latch by testing a virtual machine against the WebAssembly specification, with a large test suite consisting of 10,213 tests running on an ESP32 microcontroller. Our experience shows that the testing framework is expressive, reliable and reasonably fast, making it suitable to run large test suites on constrained devices. Furthermore, the debugger-like operations enable to closely mimic manual testing.

测试是软件开发周期的重要组成部分。遗憾的是，目前在受限设备上进行测试非常具有挑战性。首先，受限设备的内存有限，严重限制了测试套件的大小。其次，有限的处理能力导致测试套件执行缓慢，无法形成快速反馈回路。第三，当受限设备反应迟钝时，无法区分是测试失败还是测试耗时过长，开发人员不得不使用超时功能。不幸的是，超时会导致测试不稳定，即测试结果不可预测，与代码变化无关。在本文中，我们提出了新颖的测试框架 Latch（受限硬件上的大规模自动测试），以克服在受限硬件上运行大型测试套件所面临的三大挑战，并通过基于类似调试器操作的新颖测试方法实现手动测试场景的自动化--我们将这种新的测试方法称为托管测试。因此，程序员在工作站上编写脚本并运行测试，然后工作站会逐步指示受限设备执行每个测试，从而克服内存限制。我们的测试框架还允许开发人员将测试标记为依赖于其他测试。这样，Latch 就能跳过依赖于先前失败测试的测试，从而加快反馈循环。最后，Latch 通过分析模式解决了测试超时和不稳定性的问题，该模式可提供测试超时和不稳定性的反馈。我们通过在 ESP32 微控制器上运行由 10,213 个测试组成的大型测试套件，根据 WebAssembly 规范测试虚拟机来评估 Latch 的性能。我们的经验表明，该测试框架具有很强的表现力、可靠性和相当快的速度，因此适合在受限设备上运行大型测试套件。此外，类似调试器的操作还能近似模拟人工测试。

{"title":"Latch: Enabling large-scale automated testing on constrained systems","authors":"Tom Lauwaerts , Stefan Marr , Christophe Scholliers","doi":"10.1016/j.scico.2024.103157","DOIUrl":"10.1016/j.scico.2024.103157","url":null,"abstract":"<div><p>Testing is an essential part of the software development cycle. Unfortunately, testing on constrained devices is currently very challenging. First, the limited memory of constrained devices severely restricts the size of test suites. Second, the limited processing power causes test suites to execute slowly, preventing a fast feedback loop. Third, when the constrained device becomes unresponsive, it is impossible to distinguish between the test failing or taking very long, forcing the developer to work with timeouts. Unfortunately, timeouts can cause tests to be flaky, i.e., have unpredictable outcomes independent of code changes. Given these problems, most IoT developers rely on laborious manual testing.</p><p>In this paper, we propose the novel testing framework <em>Latch</em> (Large-scale Automated Testing on Constrained Hardware) to overcome the three main challenges of running large test suites on constrained hardware, as well as automate manual testing scenarios through a novel testing methodology based on debugger-like operations—we call this new testing approach <em>managed testing</em>.</p><p>The core idea of <em>Latch</em> is to enable testing on constrained devices without those devices maintaining the whole test suite in memory. Therefore, programmers script and run tests on a workstation which then step-wise instructs the constrained device to execute each test, thereby overcoming the memory constraints. Our testing framework further allows developers to mark tests as depending on other tests. This way, <em>Latch</em> can skip tests that depend on previously failing tests resulting in a faster feedback loop. Finally, <em>Latch</em> addresses the issue of timeouts and flaky tests by including an analysis mode that provides feedback on timeouts and the flakiness of tests.</p><p>To illustrate the expressiveness of <em>Latch</em>, we present testing scenarios representing unit testing, integration testing, and end-to-end testing. We evaluate the performance of <em>Latch</em> by testing a virtual machine against the WebAssembly specification, with a large test suite consisting of 10,213 tests running on an ESP32 microcontroller. Our experience shows that the testing framework is expressive, reliable and reasonably fast, making it suitable to run large test suites on constrained devices. Furthermore, the debugger-like operations enable to closely mimic manual testing.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"238 ","pages":"Article 103157"},"PeriodicalIF":1.5,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141414909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

iGnnVD: A novel software vulnerability detection model based on integrated graph neural networks iGnnVD：基于集成图神经网络的新型软件漏洞检测模型

IF 1.3 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Science of Computer Programming

Pub Date : 2024-06-06 DOI: 10.1016/j.scico.2024.103156

Jinfu Chen , Yemin Yin , Saihua Cai , Weijia Wang , Shengran Wang , Jiming Chen

Software vulnerability detection is a challenging task in the security field, the boom of deep learning technology promotes the development of automatic vulnerability detection. Compared with sequence-based deep learning models, graph neural network (GNN) can learn the structural features of code, it performs well in the field of vulnerability detection for source code. However, different GNNs have different detection results for the same code, and using a single kind of GNN may lead to high false positive rate and false negative rate. In addition, the complex structure of source code causes single GNN model cannot effectively learn their depth feature, thereby leading to low detection accuracy. To solve these limitations, we propose a software vulnerability detection model called iGnnVD based on the integrated graph neural networks. In the proposed iGnnVD model, the base detectors including GCN, GAT and APPNP are first constructed to capture the bidirectional information in the code graph structure with bidirectional structure; And then, the residual connection is used to aggregate the features while retaining the features each time; Finally, the convolutional layer is used to perform the aggregated classification. In addition, an integration module that analyzes the detection results of three detectors for final classification is designed using a voting strategy to solve the problem of high false positive rate and false negative rate caused by using a single kind of base detector. We perform extensive experiments on three datasets and experimental results show that the proposed iGnnVD model can improve the detection accuracy of vulnerabilities in source code as well as reduce the false positive rate and false negative rate compared with existing deep learning-based vulnerability detection models, it also has good stability.

软件漏洞检测是安全领域一项极具挑战性的任务，深度学习技术的蓬勃发展推动了漏洞自动检测的发展。与基于序列的深度学习模型相比，图神经网络（GNN）可以学习代码的结构特征，在源代码漏洞检测领域表现出色。然而，不同的图神经网络对相同代码的检测结果不同，使用单一类型的图神经网络可能会导致较高的假阳性率和假阴性率。此外，源代码结构复杂，单一的 GNN 模型无法有效学习其深度特征，从而导致检测准确率较低。为了解决这些问题，我们提出了一种基于集成图神经网络的软件漏洞检测模型 iGnnVD。在所提出的 iGnnVD 模型中，首先构建了包括 GCN、GAT 和 APPNP 在内的基础检测器，以捕捉具有双向结构的代码图结构中的双向信息；然后，在保留每次特征的同时，利用残差连接对特征进行聚合；最后，利用卷积层进行聚合分类。此外，我们还设计了一个集成模块，利用投票策略分析三个检测器的检测结果，进行最终分类，以解决使用单一类型的基础检测器导致的高假阳性率和假阴性率问题。我们在三个数据集上进行了大量实验，实验结果表明，与现有的基于深度学习的漏洞检测模型相比，所提出的 iGnnVD 模型可以提高源代码中漏洞的检测精度，降低误报率和误负率，而且具有良好的稳定性。

{"title":"iGnnVD: A novel software vulnerability detection model based on integrated graph neural networks","authors":"Jinfu Chen , Yemin Yin , Saihua Cai , Weijia Wang , Shengran Wang , Jiming Chen","doi":"10.1016/j.scico.2024.103156","DOIUrl":"https://doi.org/10.1016/j.scico.2024.103156","url":null,"abstract":"<div><p>Software vulnerability detection is a challenging task in the security field, the boom of deep learning technology promotes the development of automatic vulnerability detection. Compared with sequence-based deep learning models, graph neural network (GNN) can learn the structural features of code, it performs well in the field of vulnerability detection for source code. However, different GNNs have different detection results for the same code, and using a single kind of GNN may lead to high false positive rate and false negative rate. In addition, the complex structure of source code causes single GNN model cannot effectively learn their depth feature, thereby leading to low detection accuracy. To solve these limitations, we propose a software vulnerability detection model called iGnnVD based on the integrated graph neural networks. In the proposed iGnnVD model, the base detectors including GCN, GAT and APPNP are first constructed to capture the bidirectional information in the code graph structure with bidirectional structure; And then, the residual connection is used to aggregate the features while retaining the features each time; Finally, the convolutional layer is used to perform the aggregated classification. In addition, an integration module that analyzes the detection results of three detectors for final classification is designed using a voting strategy to solve the problem of high false positive rate and false negative rate caused by using a single kind of base detector. We perform extensive experiments on three datasets and experimental results show that the proposed iGnnVD model can improve the detection accuracy of vulnerabilities in source code as well as reduce the false positive rate and false negative rate compared with existing deep learning-based vulnerability detection models, it also has good stability.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"238 ","pages":"Article 103156"},"PeriodicalIF":1.3,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141323285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BEAPI: A tool for bounded exhaustive input generation from APIs BEAPI：从应用程序接口生成有界穷举输入的工具

IF 1.3 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Science of Computer Programming

Pub Date : 2024-06-05 DOI: 10.1016/j.scico.2024.103153

Mariano Politano , Valeria Bengolea , Facundo Molina , Nazareno Aguirre , Marcelo Frias , Pablo Ponzio

Bounded exhaustive testing is a very effective technique for bug finding, which proposes to test a given program under all valid bounded inputs, for a bound provided by the developer. Existing bounded exhaustive testing techniques require the developer to provide a precise specification of the valid inputs. Such specifications are rarely present as part of the software under test, and writing them can be costly and challenging.

To address this situation we propose BEAPI, a tool that given a Java class under test, generates a bounded exhaustive set of objects of the class solely employing the methods of the class, without the need for a specification. BEAPI creates sequences of calls to methods from the class' public API, and executes them to generate inputs. BEAPI implements very effective pruning techniques that allow it to generate inputs efficiently.

We experimentally assessed BEAPI in several case studies from the literature, and showed that it performs comparably to the best existing specification-based bounded exhaustive generation tool (Korat), without requiring a specification of the valid inputs.

有界穷举测试是一种非常有效的错误查找技术，它建议在所有有效的有界输入条件下测试给定程序，测试条件由开发人员提供。现有的有界穷举测试技术要求开发人员提供有效输入的精确说明。针对这种情况，我们提出了 BEAPI，它是一种工具，给定一个被测试的 Java 类，只需使用该类的方法，就能生成该类对象的有界穷举集，而无需说明。BEAPI 从类的公共 API 中创建方法调用序列，并执行这些序列以生成输入。BEAPI 实现了非常有效的剪枝技术，使其能够高效地生成输入。我们在多个文献案例研究中对 BEAPI 进行了实验性评估，结果表明它的性能可与现有最好的基于规范的有界穷举生成工具（Korat）媲美，而无需对有效输入进行规范。

{"title":"BEAPI: A tool for bounded exhaustive input generation from APIs","authors":"Mariano Politano , Valeria Bengolea , Facundo Molina , Nazareno Aguirre , Marcelo Frias , Pablo Ponzio","doi":"10.1016/j.scico.2024.103153","DOIUrl":"https://doi.org/10.1016/j.scico.2024.103153","url":null,"abstract":"<div><p>Bounded exhaustive testing is a very effective technique for bug finding, which proposes to test a given program under all valid bounded inputs, for a bound provided by the developer. Existing bounded exhaustive testing techniques require the developer to provide a precise specification of the valid inputs. Such specifications are rarely present as part of the software under test, and writing them can be costly and challenging.</p><p>To address this situation we propose BEAPI, a tool that given a Java class under test, generates a bounded exhaustive set of objects of the class solely employing the methods of the class, without the need for a specification. BEAPI creates sequences of calls to methods from the class' public API, and executes them to generate inputs. BEAPI implements very effective pruning techniques that allow it to generate inputs efficiently.</p><p>We experimentally assessed BEAPI in several case studies from the literature, and showed that it performs comparably to the best existing specification-based bounded exhaustive generation tool (Korat), without requiring a specification of the valid inputs.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"238 ","pages":"Article 103153"},"PeriodicalIF":1.3,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141294635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Parallel program analysis on path ranges 路径范围上的并行程序分析

IF 1.3 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Science of Computer Programming

Pub Date : 2024-05-31 DOI: 10.1016/j.scico.2024.103154

Jan Haltermann , Marie-Christine Jakobs , Cedric Richter , Heike Wehrheim

Symbolic execution is a software verification technique symbolically running programs and thereby checking for bugs. Ranged symbolic execution performs symbolic execution on program parts, so-called path ranges, in parallel. Due to the parallelism, verification is accelerated and hence scales to larger programs.

In this paper, we discuss a generalization of ranged symbolic execution to arbitrary program analyses. More specifically, we present a verification approach that splits programs into path ranges and then runs arbitrary analyses on the ranges in parallel. Our approach in particular allows to run different analyses on different program parts. We have implemented this generalization on top of the tool CPAchecker and evaluated it on programs from the SV-COMP benchmark. Our evaluation shows that verification can benefit from the parallelization of the verification task, but also needs a form of work stealing (between analyses) to become efficient.

符号执行是一种软件验证技术，它以符号方式运行程序，从而检查错误。范围符号执行对程序部分（即所谓的路径范围）进行并行符号执行。在本文中，我们讨论了范围符号执行对任意程序分析的推广。更具体地说，我们提出了一种验证方法，它将程序分割成路径范围，然后并行运行路径范围上的任意分析。我们的方法尤其允许在不同的程序部分运行不同的分析。我们在工具 CPAchecker 的基础上实现了这种通用方法，并在 SV-COMP 基准程序上对其进行了评估。我们的评估结果表明，验证任务的并行化可以使验证工作受益，但也需要一种形式的工作窃取（在分析之间）才能提高效率。

{"title":"Parallel program analysis on path ranges","authors":"Jan Haltermann , Marie-Christine Jakobs , Cedric Richter , Heike Wehrheim","doi":"10.1016/j.scico.2024.103154","DOIUrl":"https://doi.org/10.1016/j.scico.2024.103154","url":null,"abstract":"<div><p>Symbolic execution is a software verification technique symbolically running programs and thereby checking for bugs. Ranged symbolic execution performs symbolic execution on program parts, so-called <em>path ranges</em>, in parallel. Due to the parallelism, verification is accelerated and hence scales to larger programs.</p><p>In this paper, we discuss a generalization of ranged symbolic execution to arbitrary program analyses. More specifically, we present a verification approach that splits programs into path ranges and then runs arbitrary analyses on the ranges in parallel. Our approach in particular allows to run <em>different</em> analyses on different program parts. We have implemented this generalization on top of the tool <span>CPAchecker</span> and evaluated it on programs from the SV-COMP benchmark. Our evaluation shows that verification can benefit from the parallelization of the verification task, but also needs a form of work stealing (between analyses) to become efficient.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"238 ","pages":"Article 103154"},"PeriodicalIF":1.3,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167642324000777/pdfft?md5=c9721851a6e6fced1e9f8337cb568046&pid=1-s2.0-S0167642324000777-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141294633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0