首页 > 最新文献

IEEE Transactions on Software Engineering最新文献

英文 中文
Consistent Local-First Software: Enforcing Safety and Invariants for Local-First Applications 一致的本地优先软件:执行本地优先应用程序的安全性和不变性
IF 7.4 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-10-10 DOI: 10.1109/tse.2024.3477723
Mirko Köhler, George Zakhour, Pascal Weisenburger, Guido Salvaneschi
{"title":"Consistent Local-First Software: Enforcing Safety and Invariants for Local-First Applications","authors":"Mirko Köhler, George Zakhour, Pascal Weisenburger, Guido Salvaneschi","doi":"10.1109/tse.2024.3477723","DOIUrl":"https://doi.org/10.1109/tse.2024.3477723","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"27 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142405250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Bug-Inducing Commit Identification: A Fine-Grained Semantic Analysis Approach 增强错误诱发提交识别:细粒度语义分析方法
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-10-09 DOI: 10.1109/TSE.2024.3468296
Lingxiao Tang;Chao Ni;Qiao Huang;Lingfeng Bao
The SZZ algorithm and its variants have been extensively utilized for identifying bug-inducing commits based on bug-fixing commits. However, these algorithms face challenges when there are no deletion lines in the bug-fixing commit. Previous studies have attempted to address this issue by tracing back all lines in the block that encapsulates the added lines. However, this method is too coarse-grained and suffers from low precision. To address this issue, we propose a novel method in this paper called Sem-SZZ, which is based on fine-grained semantic analysis. Initially, we observe that a significant number of bug-inducing commits can be identified by tracing back the unmodified lines near added lines, resulting in improved precision and F1-score. Building on this observation, we conduct a more fine-grained semantic analysis. We begin by performing program slicing to extract the program part near the added lines. Subsequently, we compare the program's states between the previous version and the current version, focusing on data flow and control flow differences based on the extracted program part. Finally, we extract statements contributing to the bug based on these differences and utilize them to locate bug-inducing commits. We also extend our approach to fit the scenario where the bug-fixing commits contain deleted lines. Experimental results demonstrate that Sem-SZZ outperforms the state-of-the-art methods in identifying bug-inducing commits, regardless of whether the bug-fixing commit contains deleted lines.
SZZ 算法及其变体已被广泛用于根据错误修复提交来识别引发错误的提交。然而,当错误修复提交中没有删除行时,这些算法就会面临挑战。以前的研究试图通过回溯包含新增行的代码块中的所有行来解决这个问题。然而,这种方法粒度过粗,精度较低。为了解决这个问题,我们在本文中提出了一种基于细粒度语义分析的新方法,称为 Sem-SZZ。起初,我们观察到,通过回溯添加行附近未修改的行,可以识别出大量诱发错误的提交,从而提高了精确度和 F1 分数。基于这一观察结果,我们进行了更精细的语义分析。首先,我们对程序进行切分,提取出添加行附近的程序部分。随后,我们根据提取的程序部分,比较上一版本和当前版本的程序状态,重点关注数据流和控制流的差异。最后,我们根据这些差异提取出导致错误的语句,并利用这些语句找出导致错误的提交。我们还扩展了我们的方法,以适应修正错误的提交包含删除行的情况。实验结果表明,无论修复漏洞的提交是否包含删除行,Sem-SZZ 在识别诱发漏洞的提交方面都优于最先进的方法。
{"title":"Enhancing Bug-Inducing Commit Identification: A Fine-Grained Semantic Analysis Approach","authors":"Lingxiao Tang;Chao Ni;Qiao Huang;Lingfeng Bao","doi":"10.1109/TSE.2024.3468296","DOIUrl":"10.1109/TSE.2024.3468296","url":null,"abstract":"The SZZ algorithm and its variants have been extensively utilized for identifying bug-inducing commits based on bug-fixing commits. However, these algorithms face challenges when there are no deletion lines in the bug-fixing commit. Previous studies have attempted to address this issue by tracing back all lines in the block that encapsulates the added lines. However, this method is too coarse-grained and suffers from low precision. To address this issue, we propose a novel method in this paper called \u0000<sc>Sem-SZZ</small>\u0000, which is based on fine-grained semantic analysis. Initially, we observe that a significant number of bug-inducing commits can be identified by tracing back the unmodified lines near added lines, resulting in improved precision and F1-score. Building on this observation, we conduct a more fine-grained semantic analysis. We begin by performing program slicing to extract the program part near the added lines. Subsequently, we compare the program's states between the previous version and the current version, focusing on data flow and control flow differences based on the extracted program part. Finally, we extract statements contributing to the bug based on these differences and utilize them to locate bug-inducing commits. We also extend our approach to fit the scenario where the bug-fixing commits contain deleted lines. Experimental results demonstrate that \u0000<sc>Sem-SZZ</small>\u0000 outperforms the state-of-the-art methods in identifying bug-inducing commits, regardless of whether the bug-fixing commit contains deleted lines.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"3037-3052"},"PeriodicalIF":6.5,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142397793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated Refactoring of Non-Idiomatic Python Code With Pythonic Idioms 用 Pythonic 成语自动重构非成语 Python 代码
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-10-09 DOI: 10.1109/TSE.2024.3420886
Zejun Zhang;Zhenchang Xing;Dehai Zhao;Xiwei Xu;Liming Zhu;Qinghua Lu
Compared to other programming languages (e.g., Java), Python has more idioms to make Python code concise and efficient. Although Pythonic idioms are well accepted in the Python community, Python programmers are often faced with many challenges in using them, for example, being unaware of certain Pythonic idioms or not knowing how to use them properly. Based on an analysis of 7,577 Python repositories on GitHub, we find that non-idiomatic Python code that can be implemented with Pythonic idioms occurs frequently and widely. To assist Python developers in adopting Pythonic idioms, we design and implement an automatic refactoring tool named RIdiom to refactor code with Pythonic idioms. We identify twelve Pythonic idioms by systematically contrasting the abstract syntax grammar of Python and Java. Then we define the syntactic patterns for detecting non-idiomatic code for each Pythonic idiom. Finally, we devise atomic AST-rewriting operations and refactoring steps to refactor non-idiomatic code into idiomatic code. Our approach is evaluated on 1,814 code refactorings, achieving a precision of 0.99 and a recall of 0.87, underscoring its effectiveness. We further evaluate the tool's utility in helping developers refactor code with Pythonic idioms. A user study involving 14 students demonstrates a 112.9% improvement in correctness and a 35.5% speedup when referring to the tool-generated code pairs. Additionally, the 120 pull requests that refactor non-idiomatic code with Pythonic idioms, submitted to GitHub projects, resulted in 79 responses. Among these, 49 accepted and praised the refactorings, with 42 merging the refactorings into their repositories.
与其他编程语言(如 Java)相比,Python 有更多的习语,使 Python 代码更加简洁高效。尽管 Pythonic 习语在 Python 社区已被广泛接受,但 Python 程序员在使用这些习语时往往面临许多挑战,例如不了解某些 Pythonic 习语或不知道如何正确使用这些习语。基于对 GitHub 上 7577 个 Python 代码库的分析,我们发现可以使用 Pythonic 习语实现的非惯用 Python 代码出现得很频繁,也很广泛。为了帮助 Python 开发人员采用 Pythonic 习语,我们设计并实现了一款名为 RIdiom 的自动重构工具,用于重构 Pythonic 习语代码。通过系统地对比 Python 和 Java 的抽象语法语法,我们确定了 12 种 Pythonic 习语。然后,我们为每个 Pythonic 习语定义了用于检测非惯用代码的语法模式。最后,我们设计了原子 AST 重写操作和重构步骤,将非惯用代码重构为惯用代码。我们的方法在 1814 次代码重构中进行了评估,精确度达到 0.99,召回率达到 0.87,充分证明了它的有效性。我们进一步评估了该工具在帮助开发人员使用 Pythonic 习语重构代码方面的实用性。一项由 14 名学生参与的用户研究表明,在引用工具生成的代码对时,正确率提高了 112.9%,速度提高了 35.5%。此外,在提交给 GitHub 项目的 120 个使用 Pythonic 习语重构非惯用代码的拉取请求中,有 79 个得到了响应。其中,49 个项目接受并称赞了重构,42 个项目将重构合并到了自己的资源库中。
{"title":"Automated Refactoring of Non-Idiomatic Python Code With Pythonic Idioms","authors":"Zejun Zhang;Zhenchang Xing;Dehai Zhao;Xiwei Xu;Liming Zhu;Qinghua Lu","doi":"10.1109/TSE.2024.3420886","DOIUrl":"10.1109/TSE.2024.3420886","url":null,"abstract":"Compared to other programming languages (e.g., Java), Python has more idioms to make Python code concise and efficient. Although Pythonic idioms are well accepted in the Python community, Python programmers are often faced with many challenges in using them, for example, being unaware of certain Pythonic idioms or not knowing how to use them properly. Based on an analysis of 7,577 Python repositories on GitHub, we find that non-idiomatic Python code that can be implemented with Pythonic idioms occurs frequently and widely. To assist Python developers in adopting Pythonic idioms, we design and implement an automatic refactoring tool named RIdiom to refactor code with Pythonic idioms. We identify twelve Pythonic idioms by systematically contrasting the abstract syntax grammar of Python and Java. Then we define the syntactic patterns for detecting non-idiomatic code for each Pythonic idiom. Finally, we devise atomic AST-rewriting operations and refactoring steps to refactor non-idiomatic code into idiomatic code. Our approach is evaluated on 1,814 code refactorings, achieving a precision of 0.99 and a recall of 0.87, underscoring its effectiveness. We further evaluate the tool's utility in helping developers refactor code with Pythonic idioms. A user study involving 14 students demonstrates a 112.9% improvement in correctness and a 35.5% speedup when referring to the tool-generated code pairs. Additionally, the 120 pull requests that refactor non-idiomatic code with Pythonic idioms, submitted to GitHub projects, resulted in 79 responses. Among these, 49 accepted and praised the refactorings, with 42 merging the refactorings into their repositories.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"2827-2848"},"PeriodicalIF":6.5,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142397867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the Effectiveness of LLMs in Automated Logging Statement Generation: An Empirical Study 探索 LLM 在自动生成日志语句中的有效性:实证研究
IF 7.4 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-10-08 DOI: 10.1109/tse.2024.3475375
Yichen Li, Yintong Huo, Zhihan Jiang, Renyi Zhong, Pinjia He, Yuxin Su, Lionel C. Briand, Michael R. Lyu
{"title":"Exploring the Effectiveness of LLMs in Automated Logging Statement Generation: An Empirical Study","authors":"Yichen Li, Yintong Huo, Zhihan Jiang, Renyi Zhong, Pinjia He, Yuxin Su, Lionel C. Briand, Michael R. Lyu","doi":"10.1109/tse.2024.3475375","DOIUrl":"https://doi.org/10.1109/tse.2024.3475375","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"38 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142385554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multitask-Based Evaluation of Open-Source LLM on Software Vulnerability 基于多任务的软件漏洞开源 LLM 评估
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-10-07 DOI: 10.1109/TSE.2024.3470333
Xin Yin;Chao Ni;Shaohua Wang
This paper proposes a pipeline for quantitatively evaluating interactive Large Language Models (LLMs) using publicly available datasets. We carry out an extensive technical evaluation of LLMs using Big-Vul covering four different common software vulnerability tasks. This evaluation assesses the multi-tasking capabilities of LLMs based on this dataset. We find that the existing state-of-the-art approaches and pre-trained Language Models (LMs) are generally superior to LLMs in software vulnerability detection. However, in software vulnerability assessment and location, certain LLMs (e.g., CodeLlama and WizardCoder) have demonstrated superior performance compared to pre-trained LMs, and providing more contextual information can enhance the vulnerability assessment capabilities of LLMs. Moreover, LLMs exhibit strong vulnerability description capabilities, but their tendency to produce excessive output significantly weakens their performance compared to pre-trained LMs. Overall, though LLMs perform well in some aspects, they still need improvement in understanding the subtle differences in code vulnerabilities and the ability to describe vulnerabilities to fully realize their potential. Our evaluation pipeline provides valuable insights into the capabilities of LLMs in handling software vulnerabilities.
本文提出了一种利用公开数据集对交互式大型语言模型(LLM)进行定量评估的方法。我们使用 Big-Vul 对 LLM 进行了广泛的技术评估,涵盖了四种不同的常见软件漏洞任务。该评估基于该数据集对 LLM 的多任务处理能力进行了评估。我们发现,在软件漏洞检测方面,现有的最先进方法和预训练语言模型(LM)普遍优于 LLM。不过,在软件漏洞评估和定位方面,某些 LLM(如 CodeLlama 和 WizardCoder)的表现优于预先训练的 LM,而且提供更多上下文信息可以增强 LLM 的漏洞评估能力。此外,LLMs 表现出很强的漏洞描述能力,但与预先训练的 LMs 相比,它们产生过多输出的倾向大大削弱了其性能。总的来说,虽然 LLM 在某些方面表现出色,但它们在理解代码漏洞的细微差别和描述漏洞的能力方面仍需改进,才能充分发挥其潜力。我们的评估管道为 LLMs 处理软件漏洞的能力提供了宝贵的见解。
{"title":"Multitask-Based Evaluation of Open-Source LLM on Software Vulnerability","authors":"Xin Yin;Chao Ni;Shaohua Wang","doi":"10.1109/TSE.2024.3470333","DOIUrl":"10.1109/TSE.2024.3470333","url":null,"abstract":"This paper proposes a pipeline for quantitatively evaluating interactive Large Language Models (LLMs) using publicly available datasets. We carry out an extensive technical evaluation of LLMs using Big-Vul covering four different common software vulnerability tasks. This evaluation assesses the multi-tasking capabilities of LLMs based on this dataset. We find that the existing state-of-the-art approaches and pre-trained Language Models (LMs) are generally superior to LLMs in software vulnerability detection. However, in software vulnerability assessment and location, certain LLMs (e.g., CodeLlama and WizardCoder) have demonstrated superior performance compared to pre-trained LMs, and providing more contextual information can enhance the vulnerability assessment capabilities of LLMs. Moreover, LLMs exhibit strong vulnerability description capabilities, but their tendency to produce excessive output significantly weakens their performance compared to pre-trained LMs. Overall, though LLMs perform well in some aspects, they still need improvement in understanding the subtle differences in code vulnerabilities and the ability to describe vulnerabilities to fully realize their potential. Our evaluation pipeline provides valuable insights into the capabilities of LLMs in handling software vulnerabilities.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"3071-3087"},"PeriodicalIF":6.5,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142384407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Qualitative Surveys in Software Engineering Research: Definition, Critical Review, and Guidelines 软件工程研究中的定性调查:定义、严格审查和指南
IF 7.4 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-10-04 DOI: 10.1109/tse.2024.3474173
Jorge Melegati, Kieran Conboy, Daniel Graziotin
{"title":"Qualitative Surveys in Software Engineering Research: Definition, Critical Review, and Guidelines","authors":"Jorge Melegati, Kieran Conboy, Daniel Graziotin","doi":"10.1109/tse.2024.3474173","DOIUrl":"https://doi.org/10.1109/tse.2024.3474173","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"5 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142377310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categories and Test Code Repair FlakyFix:使用大型语言模型预测缺陷测试修复类别和测试代码修复
IF 7.4 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-10-02 DOI: 10.1109/tse.2024.3472476
Sakina Fatima, Hadi Hemmati, Lionel Briand
{"title":"FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categories and Test Code Repair","authors":"Sakina Fatima, Hadi Hemmati, Lionel Briand","doi":"10.1109/tse.2024.3472476","DOIUrl":"https://doi.org/10.1109/tse.2024.3472476","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"29 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142368849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LTM: Scalable and Black-Box Similarity-Based Test Suite Minimization Based on Language Models LTM:基于语言模型的可扩展黑盒相似性测试套件最小化
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-09-30 DOI: 10.1109/TSE.2024.3469582
Rongqi Pan;Taher A. Ghaleb;Lionel C. Briand
Test suites tend to grow when software evolves, making it often infeasible to execute all test cases with the allocated testing budgets, especially for large software systems. Test suite minimization (TSM) is employed to improve the efficiency of software testing by removing redundant test cases, thus reducing testing time and resources while maintaining the fault detection capability of the test suite. Most existing TSM approaches rely on code coverage (white-box) or model-based features, which are not always available to test engineers. Recent TSM approaches that rely only on test code (black-box) have been proposed, such as ATM and FAST-R. The former yields higher fault detection rates (FDR) while the latter is faster. To address scalability while retaining a high FDR, we propose LTM (Language model-based Test suite Minimization), a novel, scalable, and black-box similarity-based TSM approach based on large language models (LLMs), which is the first application of LLMs in the context of TSM. To support similarity measurement using test method embeddings, we investigate five different pre-trained language models: CodeBERT, GraphCodeBERT, UniXcoder, StarEncoder, and CodeLlama, on which we compute two similarity measures: Cosine Similarity and Euclidean Distance. Our goal is to find similarity measures that are not only computationally more efficient but can also better guide a Genetic Algorithm (GA), which is used to search for optimal minimized test suites, thus reducing the overall search time. Experimental results show that the best configuration of LTM (UniXcoder/Cosine) outperforms ATM in three aspects: (a) achieving a slightly greater saving rate of testing time ($41.72%$ versus $41.02%$, on average); (b) attaining a significantly higher fault detection rate ($0.84$ versus $0.81$, on average); and, most importantly, (c) minimizing test suites nearly five times faster on average, with higher gains for larger test suites and systems, thus achieving much higher scalability.
随着软件的发展,测试套件往往会不断增加,因此往往无法用分配的测试预算执行所有测试用例,尤其是大型软件系统。测试套件最小化(TSM)通过删除多余的测试用例来提高软件测试的效率,从而减少测试时间和资源,同时保持测试套件的故障检测能力。大多数现有的 TSM 方法都依赖于代码覆盖率(白盒)或基于模型的功能,而测试工程师并非总能获得这些功能。最近提出的 TSM 方法仅依赖于测试代码(黑盒),如 ATM 和 FAST-R。前者的故障检测率(FDR)更高,后者更快。为了在保持高 FDR 的同时解决可扩展性问题,我们提出了 LTM(基于语言模型的测试套件最小化),这是一种基于大型语言模型(LLM)的新颖、可扩展、基于黑盒相似性的 TSM 方法,也是 LLM 在 TSM 中的首次应用。为了支持使用测试方法嵌入进行相似性测量,我们研究了五种不同的预训练语言模型:CodeBERT、GraphCodeBERT、UniXcoder、StarEncoder 和 CodeLlama:余弦相似度和欧氏距离。我们的目标是找到不仅计算效率更高,而且能更好地指导遗传算法(GA)的相似性度量,遗传算法用于搜索最优的最小化测试套件,从而减少整体搜索时间。实验结果表明,LTM 的最佳配置(UniXcoder/Cosine)在以下三个方面优于 ATM:(a) 测试时间节省率略高(平均为 41.72 美元/%$,而 ATM 为 41.02 美元/%$);(b) 故障检测率显著提高(平均为 0.84 美元/%$,而 ATM 为 0.81 美元/%$);最重要的是,(c) 最小化测试套件的速度平均提高了近五倍,对于较大的测试套件和系统,提高的幅度更大,从而实现了更高的可扩展性。
{"title":"LTM: Scalable and Black-Box Similarity-Based Test Suite Minimization Based on Language Models","authors":"Rongqi Pan;Taher A. Ghaleb;Lionel C. Briand","doi":"10.1109/TSE.2024.3469582","DOIUrl":"10.1109/TSE.2024.3469582","url":null,"abstract":"Test suites tend to grow when software evolves, making it often infeasible to execute all test cases with the allocated testing budgets, especially for large software systems. Test suite minimization (TSM) is employed to improve the efficiency of software testing by removing redundant test cases, thus reducing testing time and resources while maintaining the fault detection capability of the test suite. Most existing TSM approaches rely on code coverage (white-box) or model-based features, which are not always available to test engineers. Recent TSM approaches that rely only on test code (black-box) have been proposed, such as ATM and FAST-R. The former yields higher fault detection rates (\u0000<i>FDR</i>\u0000) while the latter is faster. To address scalability while retaining a high \u0000<i>FDR</i>\u0000, we propose LTM (\u0000<b>L</b>\u0000anguage model-based \u0000<b>T</b>\u0000est suite \u0000<b>M</b>\u0000inimization), a novel, scalable, and black-box similarity-based TSM approach based on large language models (LLMs), which is the first application of LLMs in the context of TSM. To support similarity measurement using test method embeddings, we investigate five different pre-trained language models: CodeBERT, GraphCodeBERT, UniXcoder, StarEncoder, and CodeLlama, on which we compute two similarity measures: Cosine Similarity and Euclidean Distance. Our goal is to find similarity measures that are not only computationally more efficient but can also better guide a Genetic Algorithm (GA), which is used to search for optimal minimized test suites, thus reducing the overall search time. Experimental results show that the best configuration of LTM (UniXcoder/Cosine) outperforms ATM in three aspects: (a) achieving a slightly greater saving rate of testing time (\u0000<inline-formula><tex-math>$41.72%$</tex-math></inline-formula>\u0000 versus \u0000<inline-formula><tex-math>$41.02%$</tex-math></inline-formula>\u0000, on average); (b) attaining a significantly higher fault detection rate (\u0000<inline-formula><tex-math>$0.84$</tex-math></inline-formula>\u0000 versus \u0000<inline-formula><tex-math>$0.81$</tex-math></inline-formula>\u0000, on average); and, most importantly, (c) minimizing test suites nearly five times faster on average, with higher gains for larger test suites and systems, thus achieving much higher scalability.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"3053-3070"},"PeriodicalIF":6.5,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10697930","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142360235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast and Precise Static Null Exception Analysis With Synergistic Preprocessing 利用协同预处理进行快速精确的静态空异常分析
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-09-23 DOI: 10.1109/TSE.2024.3466551
Yi Sun;Chengpeng Wang;Gang Fan;Qingkai Shi;Xiangyu Zhang
Pointer operations are common in programs written in modern programming languages such as C/C++ and Java. While widely used, pointer operations often suffer from bugs like null pointer exceptions that make software systems vulnerable and unstable. However, precisely verifying the absence of null pointer exceptions is notoriously slow as we need to inspect a huge number of pointer-dereferencing operations one by one via expensive techniques like SMT solving. We observe that, among all pointer-dereferencing operations in a program, a large number can be proven to be safe by lightweight preprocessing. Thus, we can avoid employing costly techniques to verify their nullity. The impacts of lightweight preprocessing techniques are significantly less studied and ignored by recent works. In this paper, we propose a new technique, BONA, which leverages the synergistic effects of two classic preprocessing analyses. The synergistic effects between the two preprocessing analyses allow us to recognize a lot more safe pointer operations before a follow-up costly nullity verification, thus improving the scalability of the whole null exception analysis. We have implemented our synergistic preprocessing procedure in two state-of-the-art static analyzers, KLEE and Pinpoint. The evaluation results demonstrate that BONA itself is fast and can finish in a few seconds for programs that KLEE and Pinpoint may require several minutes or even hours to analyze. Compared to the vanilla versions of KLEE and Pinpoint, BONA respectively enables them to achieve up to 1.6x and 6.6x speedup (1.2x and 3.8x on average) with less than 0.5% overhead. Such a speedup is significant enough as it allows KLEE and Pinpoint to check more pointer-dereferencing operations in a given time budget and, thus, discover over a dozen previously unknown null pointer exceptions in open-source projects.
指针操作在 C/C++ 和 Java 等现代编程语言编写的程序中很常见。虽然指针操作被广泛使用,但指针操作经常出现空指针异常等错误,使软件系统变得脆弱和不稳定。然而,精确验证是否存在空指针异常是出了名的慢,因为我们需要通过昂贵的技术(如 SMT 求解)逐一检查大量的指针反引用操作。我们发现,在程序中的所有指针参照操作中,有大量操作可以通过轻量级预处理证明是安全的。因此,我们可以避免使用昂贵的技术来验证它们的无效性。轻量级预处理技术的影响在近期的研究中被忽视,研究较少。在本文中,我们提出了一种新技术--BONA,它充分利用了两种经典预处理分析的协同效应。两种预处理分析的协同效应使我们能够在后续代价高昂的无效性验证之前识别出更多安全的指针操作,从而提高整个无效异常分析的可扩展性。我们在 KLEE 和 Pinpoint 这两个最先进的静态分析器中实施了我们的协同预处理程序。评估结果表明,BONA 本身的速度很快,对于 KLEE 和 Pinpoint 可能需要几分钟甚至几小时才能分析完的程序,BONA 可以在几秒钟内完成分析。与普通版本的 KLEE 和 Pinpoint 相比,BONA 使它们的速度分别提高了 1.6 倍和 6.6 倍(平均 1.2 倍和 3.8 倍),而开销却不到 0.5%。这样的提速非常显著,因为它允许 KLEE 和 Pinpoint 在给定的时间预算内检查更多的指针参照操作,从而在开源项目中发现了十多个以前未知的空指针异常。
{"title":"Fast and Precise Static Null Exception Analysis With Synergistic Preprocessing","authors":"Yi Sun;Chengpeng Wang;Gang Fan;Qingkai Shi;Xiangyu Zhang","doi":"10.1109/TSE.2024.3466551","DOIUrl":"10.1109/TSE.2024.3466551","url":null,"abstract":"Pointer operations are common in programs written in modern programming languages such as C/C++ and Java. While widely used, pointer operations often suffer from bugs like null pointer exceptions that make software systems vulnerable and unstable. However, precisely verifying the absence of null pointer exceptions is notoriously slow as we need to inspect a huge number of pointer-dereferencing operations one by one via expensive techniques like SMT solving. We observe that, among all pointer-dereferencing operations in a program, a large number can be proven to be safe by lightweight preprocessing. Thus, we can avoid employing costly techniques to verify their nullity. The impacts of lightweight preprocessing techniques are significantly less studied and ignored by recent works. In this paper, we propose a new technique, BONA, which leverages the synergistic effects of two classic preprocessing analyses. The synergistic effects between the two preprocessing analyses allow us to recognize a lot more safe pointer operations before a follow-up costly nullity verification, thus improving the scalability of the whole null exception analysis. We have implemented our synergistic preprocessing procedure in two state-of-the-art static analyzers, KLEE and Pinpoint. The evaluation results demonstrate that BONA itself is fast and can finish in a few seconds for programs that KLEE and Pinpoint may require several minutes or even hours to analyze. Compared to the vanilla versions of KLEE and Pinpoint, BONA respectively enables them to achieve up to 1.6x and 6.6x speedup (1.2x and 3.8x on average) with less than 0.5% overhead. Such a speedup is significant enough as it allows KLEE and Pinpoint to check more pointer-dereferencing operations in a given time budget and, thus, discover over a dozen previously unknown null pointer exceptions in open-source projects.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"3022-3036"},"PeriodicalIF":6.5,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142313658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards a Cognitive Model of Dynamic Debugging: Does Identifier Construction Matter? 建立动态调试的认知模型:标识符构造重要吗?
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-09-20 DOI: 10.1109/TSE.2024.3465222
Danniell Hu;Priscila Santiesteban;Madeline Endres;Westley Weimer
Debugging is a vital and time-consuming process in software engineering. Recently, researchers have begun using neuroimaging to understand the cognitive bases of programming tasks by measuring patterns of neural activity. While exciting, prior studies have only examined small sub-steps in isolation, such as comprehending a method without writing any code or writing a method from scratch without reading any already-existing code. We propose a simple multi-stage debugging model in which programmers transition between Task Comprehension, Fault Localization, Code Editing, Compiling, and Output Comprehension activities. We conduct a human study of $n=28$ participants using a combination of functional near-infrared spectroscopy and standard coding measurements (e.g., time taken, tests passed, etc.). Critically, we find that our proposed debugging stages are both neurally and behaviorally distinct. To the best of our knowledge, this is the first neurally-justified cognitive model of debugging. At the same time, there is significant interest in understanding how programmers from different backgrounds, such as those grappling with challenges in English prose comprehension, are impacted by code features when debugging. We use our cognitive model of debugging to investigate the role of one such feature: identifier construction. Specifically, we investigate how features of identifier construction impact neural activity while debugging by participants with and without reading difficulties. While we find significant differences in cognitive load as a function of morphology and expertise, we do not find significant differences in end-to-end programming outcomes (e.g., time, correctness, etc.). This nuanced result suggests that prior findings on the cognitive importance of identifier naming in isolated sub-steps may not generalize to end-to-end debugging. Finally, in a result relevant to broadening participation in computing, we find no behavioral outcome differences for participants with reading difficulties.
调试是软件工程中一个重要而耗时的过程。最近,研究人员开始利用神经成像技术,通过测量神经活动模式来了解编程任务的认知基础。之前的研究虽然令人兴奋,但只是孤立地研究了一些小的子步骤,例如在不编写任何代码的情况下理解一个方法,或者在不阅读任何已有代码的情况下从头开始编写一个方法。我们提出了一个简单的多阶段调试模式,程序员可以在任务理解、故障定位、代码编辑、编译和输出理解活动之间转换。我们使用功能性近红外光谱和标准编码测量(如所花费的时间、通过的测试等)相结合的方法,对 $n=28$ 的参与者进行了人体研究。重要的是,我们发现我们提出的调试阶段在神经和行为上都是不同的。据我们所知,这是第一个在神经上合理的调试认知模型。与此同时,人们对了解来自不同背景的程序员(例如那些在英语散文理解方面遇到困难的程序员)在调试时如何受到代码特征的影响有着浓厚的兴趣。我们使用我们的调试认知模型来研究这样一种特征的作用:标识符构造。具体来说,我们研究了有阅读困难和没有阅读困难的参与者在调试时,标识符构造特征如何影响神经活动。虽然我们发现认知负荷在形态和专业知识方面存在显著差异,但在端到端编程结果(如时间、正确性等)方面却没有发现显著差异。这一微妙的结果表明,之前关于孤立子步骤中标识符命名的认知重要性的研究结果可能无法推广到端到端调试中。最后,在一项与扩大计算参与相关的结果中,我们发现有阅读困难的参与者在行为结果上没有差异。
{"title":"Towards a Cognitive Model of Dynamic Debugging: Does Identifier Construction Matter?","authors":"Danniell Hu;Priscila Santiesteban;Madeline Endres;Westley Weimer","doi":"10.1109/TSE.2024.3465222","DOIUrl":"10.1109/TSE.2024.3465222","url":null,"abstract":"Debugging is a vital and time-consuming process in software engineering. Recently, researchers have begun using neuroimaging to understand the cognitive bases of programming tasks by measuring patterns of neural activity. While exciting, prior studies have only examined small sub-steps in isolation, such as comprehending a method without writing any code or writing a method from scratch without reading any already-existing code. We propose a simple multi-stage debugging model in which programmers transition between Task Comprehension, Fault Localization, Code Editing, Compiling, and Output Comprehension activities. We conduct a human study of \u0000<inline-formula><tex-math>$n=28$</tex-math></inline-formula>\u0000 participants using a combination of functional near-infrared spectroscopy and standard coding measurements (e.g., time taken, tests passed, etc.). Critically, we find that our proposed debugging stages are both neurally and behaviorally distinct. To the best of our knowledge, this is the first neurally-justified cognitive model of debugging. At the same time, there is significant interest in understanding how programmers from different backgrounds, such as those grappling with challenges in English prose comprehension, are impacted by code features when debugging. We use our cognitive model of debugging to investigate the role of one such feature: identifier construction. Specifically, we investigate how features of identifier construction impact neural activity while debugging by participants with and without reading difficulties. While we find significant differences in cognitive load as a function of morphology and expertise, we do not find significant differences in end-to-end programming outcomes (e.g., time, correctness, etc.). This nuanced result suggests that prior findings on the cognitive importance of identifier naming in isolated sub-steps may not generalize to end-to-end debugging. Finally, in a result relevant to broadening participation in computing, we find no behavioral outcome differences for participants with reading difficulties.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"3007-3021"},"PeriodicalIF":6.5,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142275365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Software Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1