首页 > 最新文献

IEEE Transactions on Software Engineering最新文献

英文 中文
Understanding Code Understandability Improvements in Code Reviews 了解代码审查中代码可理解性的改进
IF 7.4 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-09-10 DOI: 10.1109/tse.2024.3453783
Delano Oliveira, Reydne Santos, Benedito de Oliveira, Martin Monperrus, Fernando Castor, Fernanda Madeiral
{"title":"Understanding Code Understandability Improvements in Code Reviews","authors":"Delano Oliveira, Reydne Santos, Benedito de Oliveira, Martin Monperrus, Fernando Castor, Fernanda Madeiral","doi":"10.1109/tse.2024.3453783","DOIUrl":"https://doi.org/10.1109/tse.2024.3453783","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"48 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142166419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HetFL: Heterogeneous Graph-Based Software Fault Localization HetFL:基于异构图的软件故障定位
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-09-05 DOI: 10.1109/TSE.2024.3454605
Xin Chen;Tian Sun;Dongling Zhuang;Dongjin Yu;He Jiang;Zhide Zhou;Sicheng Li
Automated software fault localization has become one of the hot spots on which researchers have focused in recent years. Existing studies have shown that learning-based techniques can effectively localize faults leveraging various information. However, there exist two problems in these techniques. The first is that they simply represent various information without caring the contribution of different information. The second is that the data imbalance problem is not considered in these techniques. Thus, their effectiveness is limited in practice. In this paper, we propose HetFL, a novel heterogeneous graph-based software fault localization technique to aggregate different information into a heterogeneous graph in which program entities and test cases are regarded as nodes, and coverage, change histories, and call relationships are viewed as edges. HetFL first extracts textual and structure information from source code as attributes of nodes and integrates them to form an attribute vector. Then, for a given node, HetFL finds its neighbor nodes based on the types of edges and aggregates corresponding neighbor nodes to form type vectors. After that, the attribute vector and all the type vectors of each node are aggregated to generate the final vector representation by an attention mechanism. Finally, we leverage a convolution neural network (CNN) to obtain the suspicious score of each method. To validate the effectiveness of HetFL, experiments are conducted on the widely used dataset Defects4J (v1.2.0). The experimental results show that HetFL can localize 217 faults within Top-1 that is 25 higher than the state-of-the-art technique DeepFL, and achieve 6.37 and 5.58 in terms of MAR and MFR which improve DeepFL by 9.0% and 5.6%, respectively. In addition, we also perform experiments on the latest version of Defects4J (v2.0.0). The experimental results show that HetFL has better performance than the baseline methods.
软件故障自动定位已成为近年来研究人员关注的热点之一。现有研究表明,基于学习的技术可以利用各种信息有效定位故障。然而,这些技术存在两个问题。一是它们只是简单地表示各种信息,而没有考虑不同信息的贡献。第二,这些技术没有考虑数据不平衡问题。因此,它们在实践中的效果有限。在本文中,我们提出了一种基于异构图的新型软件故障定位技术 HetFL,它将不同的信息聚合到一个异构图中,其中程序实体和测试用例被视为节点,覆盖率、变更历史和调用关系被视为边。HetFL 首先从源代码中提取文本和结构信息作为节点的属性,并将其整合形成属性向量。然后,对于给定的节点,HetFL 会根据边的类型找到它的邻居节点,并将相应的邻居节点聚合起来形成类型向量。然后,每个节点的属性向量和所有类型向量通过注意力机制进行聚合,生成最终的向量表示。最后,我们利用卷积神经网络(CNN)来获得每种方法的可疑得分。为了验证 HetFL 的有效性,我们在广泛使用的数据集 Defects4J(v1.2.0)上进行了实验。实验结果表明,HetFL 可以在 Top-1 范围内定位 217 个故障,比最先进的 DeepFL 高出 25 个,在 MAR 和 MFR 方面分别达到 6.37 和 5.58,比 DeepFL 分别提高了 9.0% 和 5.6%。此外,我们还在最新版本的 Defects4J(v2.0.0)上进行了实验。实验结果表明,HetFL 比基线方法具有更好的性能。
{"title":"HetFL: Heterogeneous Graph-Based Software Fault Localization","authors":"Xin Chen;Tian Sun;Dongling Zhuang;Dongjin Yu;He Jiang;Zhide Zhou;Sicheng Li","doi":"10.1109/TSE.2024.3454605","DOIUrl":"10.1109/TSE.2024.3454605","url":null,"abstract":"Automated software fault localization has become one of the hot spots on which researchers have focused in recent years. Existing studies have shown that learning-based techniques can effectively localize faults leveraging various information. However, there exist two problems in these techniques. The first is that they simply represent various information without caring the contribution of different information. The second is that the data imbalance problem is not considered in these techniques. Thus, their effectiveness is limited in practice. In this paper, we propose HetFL, a novel heterogeneous graph-based software fault localization technique to aggregate different information into a heterogeneous graph in which program entities and test cases are regarded as nodes, and coverage, change histories, and call relationships are viewed as edges. HetFL first extracts textual and structure information from source code as attributes of nodes and integrates them to form an attribute vector. Then, for a given node, HetFL finds its neighbor nodes based on the types of edges and aggregates corresponding neighbor nodes to form type vectors. After that, the attribute vector and all the type vectors of each node are aggregated to generate the final vector representation by an attention mechanism. Finally, we leverage a convolution neural network (CNN) to obtain the suspicious score of each method. To validate the effectiveness of HetFL, experiments are conducted on the widely used dataset Defects4J (v1.2.0). The experimental results show that HetFL can localize 217 faults within Top-1 that is 25 higher than the state-of-the-art technique DeepFL, and achieve 6.37 and 5.58 in terms of MAR and MFR which improve DeepFL by 9.0% and 5.6%, respectively. In addition, we also perform experiments on the latest version of Defects4J (v2.0.0). The experimental results show that HetFL has better performance than the baseline methods.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"2884-2905"},"PeriodicalIF":6.5,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Does the Vulnerability Threaten Our Projects? Automated Vulnerable API Detection for Third-Party Libraries 漏洞会威胁到我们的项目吗?第三方库的自动漏洞 API 检测
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-09-05 DOI: 10.1109/TSE.2024.3454960
Fangyuan Zhang;Lingling Fan;Sen Chen;Miaoying Cai;Sihan Xu;Lida Zhao
Developers usually use third-party libraries (TPLs) to facilitate the development of their projects to avoid reinventing the wheels, however, the vulnerable TPLs indeed cause severe security threats. The majority of existing research only considered whether projects used vulnerable TPLs but neglected whether the vulnerable code of the TPLs was indeed used by the projects, which inevitably results in false positives and further requires additional patching efforts and maintenance costs (e.g., dependency conflict issues after version upgrades). To mitigate such a problem, we propose VAScanner, which can effectively identify vulnerable root methods causing vulnerabilities in TPLs and further identify all vulnerable APIs of TPLs used by Java projects. Specifically, we first collect the initial patch methods from the patch commits and extract accurate patch methods by employing a patch-unrelated sifting mechanism, then we further identify the vulnerable root methods for each vulnerability by employing an augmentation mechanism. Based on them, we leverage backward call graph analysis to identify all vulnerable APIs for each vulnerable TPL version and construct a database consisting of 90,749 (2,410,779 with library versions) vulnerable APIswith 1.45% false positive proportion with a 95% confidence interval (CI) of [1.31%, 1.59%] from 362 TPLs with 14,775 versions. The database serves as a reference database to help developers detect vulnerable APIs of TPLs used by projects. Our experiments show VAScanner eliminates 5.78% false positives and 2.16% false negatives owing to the proposed sifting and augmentation mechanisms. Besides, it outperforms the state-of-the-art method-level vulnerability detection tool in analyzing direct dependencies, Eclipse Steady, achieving more effective detection of vulnerable APIs. Furthermore, to investigate the real impact of vulnerabilities on real open-source projects, we exploit VAScanner to conduct a large-scale analysis on 3,147 projects that depend on vulnerable TPLs, and find only 21.51% of projects (with 1.83% false positive proportion and a 95% CI of [0.71%, 4.61%]) were threatened through vulnerable APIs, demonstrating that VAScanner can potentially reduce false positives significantly.
开发人员通常使用第三方库(TPL)来促进其项目的开发,以避免重复开发,然而,易受攻击的 TPL 确实会造成严重的安全威胁。现有的研究大多只考虑项目是否使用了易受攻击的 TPL,却忽略了项目是否确实使用了 TPL 的易受攻击代码,这不可避免地会导致误报,并进一步需要额外的补丁工作和维护成本(例如版本升级后的依赖冲突问题)。为了缓解这一问题,我们提出了 VAScanner,它可以有效识别导致 TPL 漏洞的易受攻击根方法,并进一步识别 Java 项目使用的 TPL 的所有易受攻击 API。具体来说,我们首先从补丁提交中收集初始补丁方法,并通过补丁无关筛选机制提取准确的补丁方法,然后通过增强机制进一步识别每个漏洞的易受攻击根方法。在此基础上,我们利用后向调用图分析法识别出每个易受攻击的 TPL 版本的所有易受攻击 API,并从 362 个 TPL 的 14,775 个版本中构建了一个包含 90,749 个(含库版本的 2,410,779 个)易受攻击 API 的数据库,其中假阳性比例为 1.45%,置信区间 (CI) 为 [1.31%, 1.59%]。该数据库可作为参考数据库,帮助开发人员检测项目中使用的 TPL 的易受攻击 API。我们的实验表明,由于采用了筛选和增强机制,VAScanner 消除了 5.78% 的误报和 2.16% 的误报。此外,在分析直接依赖关系方面,VAScanner 优于最先进的方法级漏洞检测工具 Eclipse Steady,能更有效地检测出有漏洞的 API。此外,为了研究漏洞对真实开源项目的实际影响,我们利用 VAScanner 对 3,147 个依赖于易受攻击 TPL 的项目进行了大规模分析,发现只有 21.51% 的项目(误报比例为 1.83%,95% CI 为 [0.71%, 4.61%])受到易受攻击 API 的威胁,这表明 VAScanner 有可能显著降低误报率。
{"title":"Does the Vulnerability Threaten Our Projects? Automated Vulnerable API Detection for Third-Party Libraries","authors":"Fangyuan Zhang;Lingling Fan;Sen Chen;Miaoying Cai;Sihan Xu;Lida Zhao","doi":"10.1109/TSE.2024.3454960","DOIUrl":"10.1109/TSE.2024.3454960","url":null,"abstract":"Developers usually use third-party libraries (TPLs) to facilitate the development of their projects to avoid reinventing the wheels, however, the vulnerable TPLs indeed cause severe security threats. The majority of existing research only considered whether projects used vulnerable TPLs but neglected whether the vulnerable code of the TPLs was indeed used by the projects, which inevitably results in false positives and further requires additional patching efforts and maintenance costs (e.g., dependency conflict issues after version upgrades). To mitigate such a problem, we propose \u0000<monospace>VAScanner</monospace>\u0000, which can effectively identify vulnerable root methods causing vulnerabilities in TPLs and further identify all vulnerable APIs of TPLs used by Java projects. Specifically, we first collect the initial patch methods from the patch commits and extract accurate patch methods by employing a patch-unrelated sifting mechanism, then we further identify the vulnerable root methods for each vulnerability by employing an augmentation mechanism. Based on them, we leverage backward call graph analysis to identify all vulnerable APIs for each vulnerable TPL version and construct a database consisting of 90,749 (2,410,779 with library versions) vulnerable APIswith 1.45% false positive proportion with a 95% confidence interval (CI) of [1.31%, 1.59%] from 362 TPLs with 14,775 versions. The database serves as a reference database to help developers detect vulnerable APIs of TPLs used by projects. Our experiments show \u0000<monospace>VAScanner</monospace>\u0000 eliminates 5.78% false positives and 2.16% false negatives owing to the proposed sifting and augmentation mechanisms. Besides, it outperforms the state-of-the-art method-level vulnerability detection tool in analyzing direct dependencies, Eclipse Steady, achieving more effective detection of vulnerable APIs. Furthermore, to investigate the real impact of vulnerabilities on real open-source projects, we exploit \u0000<monospace>VAScanner</monospace>\u0000 to conduct a large-scale analysis on 3,147 projects that depend on vulnerable TPLs, and find only 21.51% of projects (with 1.83% false positive proportion and a 95% CI of [0.71%, 4.61%]) were threatened through vulnerable APIs, demonstrating that \u0000<monospace>VAScanner</monospace>\u0000 can potentially reduce false positives significantly.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"2906-2920"},"PeriodicalIF":6.5,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating Diverse Large Language Models for Automatic and General Bug Reproduction 评估用于自动和通用错误复制的各种大型语言模型
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-09-04 DOI: 10.1109/TSE.2024.3450837
Sungmin Kang;Juyeon Yoon;Nargiz Askarbekkyzy;Shin Yoo
Bug reproduction is a critical developer activity that is also challenging to automate, as bug reports are often in natural language and thus can be difficult to transform to test cases consistently. As a result, existing techniques mostly focused on crash bugs, which are easier to automatically detect and verify. In this work, we overcome this limitation by using large language models (LLMs), which have been demonstrated to be adept at natural language processing and code generation. By prompting LLMs to generate bug-reproducing tests, and via a post-processing pipeline to automatically identify promising generated tests, our proposed technique Libro could successfully reproduce about one-third of all bugs in the widely used Defects4J benchmark. Furthermore, our extensive evaluation on 15 LLMs, including 11 open-source LLMs, suggests that open-source LLMs also demonstrate substantial potential, with the StarCoder LLM achieving 70% of the reproduction performance of the closed-source OpenAI LLM code-davinci-002 on the large Defects4J benchmark, and 90% of performance on a held-out bug dataset likely not part of any LLM's training data. In addition, our experiments on LLMs of different sizes show that bug reproduction using Libro improves as LLM size increases, providing information as to which LLMs can be used with the Libro pipeline.
Bug 重现是一项关键的开发人员活动,也是一项具有挑战性的自动化活动,因为 Bug 报告通常使用自然语言,因此很难一致地转换为测试用例。因此,现有技术大多集中在崩溃错误上,而崩溃错误更容易自动检测和验证。在这项工作中,我们通过使用大型语言模型(LLM)克服了这一局限性,大型语言模型已被证明擅长自然语言处理和代码生成。通过提示 LLM 生成会产生错误的测试,并通过后处理管道自动识别有希望生成的测试,我们提出的 Libro 技术可以成功地重现广泛使用的 Defects4J 基准中约三分之一的错误。此外,我们对包括 11 个开源 LLM 在内的 15 个 LLM 进行了广泛评估,结果表明开源 LLM 也展现出了巨大的潜力,其中 StarCoder LLM 在大型 Defects4J 基准上的重现性能达到了闭源 OpenAI LLM code-davinci-002 的 70%,在可能不属于任何 LLM 训练数据的保留错误数据集上的重现性能达到了 90%。此外,我们在不同规模的 LLM 上进行的实验表明,随着 LLM 规模的增加,使用 Libro 的错误再现能力也会提高,这为哪些 LLM 可以与 Libro 管道一起使用提供了信息。
{"title":"Evaluating Diverse Large Language Models for Automatic and General Bug Reproduction","authors":"Sungmin Kang;Juyeon Yoon;Nargiz Askarbekkyzy;Shin Yoo","doi":"10.1109/TSE.2024.3450837","DOIUrl":"10.1109/TSE.2024.3450837","url":null,"abstract":"Bug reproduction is a critical developer activity that is also challenging to automate, as bug reports are often in natural language and thus can be difficult to transform to test cases consistently. As a result, existing techniques mostly focused on crash bugs, which are easier to automatically detect and verify. In this work, we overcome this limitation by using large language models (LLMs), which have been demonstrated to be adept at natural language processing and code generation. By prompting LLMs to generate bug-reproducing tests, and via a post-processing pipeline to automatically identify promising generated tests, our proposed technique \u0000<sc>Libro</small>\u0000 could successfully reproduce about one-third of all bugs in the widely used Defects4J benchmark. Furthermore, our extensive evaluation on 15 LLMs, including 11 open-source LLMs, suggests that open-source LLMs also demonstrate substantial potential, with the StarCoder LLM achieving 70% of the reproduction performance of the closed-source OpenAI LLM code-davinci-002 on the large Defects4J benchmark, and 90% of performance on a held-out bug dataset likely not part of any LLM's training data. In addition, our experiments on LLMs of different sizes show that bug reproduction using \u0000<sc>Libro</small>\u0000 improves as LLM size increases, providing information as to which LLMs can be used with the \u0000<sc>Libro</small>\u0000 pipeline.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 10","pages":"2677-2694"},"PeriodicalIF":6.5,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142138065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RLocator: Reinforcement Learning for Bug Localization RLocator:用于错误定位的强化学习
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-30 DOI: 10.1109/TSE.2024.3452595
Partha Chakraborty;Mahmoud Alfadel;Meiyappan Nagappan
Software developers spend a significant portion of time fixing bugs in their projects. To streamline this process, bug localization approaches have been proposed to identify the source code files that are likely responsible for a particular bug. Prior work proposed several similarity-based machine-learning techniques for bug localization. Despite significant advances in these techniques, they do not directly optimize the evaluation measures. We argue that directly optimizing evaluation measures can positively contribute to the performance of bug localization approaches. Therefore, in this paper, we utilize Reinforcement Learning (RL) techniques to directly optimize the ranking metrics. We propose RLocator, a Reinforcement Learning-based bug localization approach. We formulate RLocator using a Markov Decision Process (MDP) to optimize the evaluation measures directly. We present the technique and experimentally evaluate it based on a benchmark dataset of 8,316 bug reports from six highly popular Apache projects. The results of our evaluation reveal that RLocator achieves a Mean Reciprocal Rank (MRR) of 0.62, a Mean Average Precision (MAP) of 0.59, and a Top 1 score of 0.46. We compare RLocator with three state-of-the-art bug localization tools, FLIM, BugLocator, and BL-GAN. Our evaluation reveals that RLocator outperforms both approaches by a substantial margin, with improvements of 38.3% in MAP, 36.73% in MRR, and 23.68% in the Top K metric. These findings highlight that directly optimizing evaluation measures considerably contributes to performance improvement of the bug localization problem.
软件开发人员需要花费大量时间来修复项目中的错误。为了简化这一过程,有人提出了错误定位方法,以识别可能导致特定错误的源代码文件。之前的工作提出了几种基于相似性的机器学习技术,用于错误定位。尽管这些技术取得了重大进展,但它们并没有直接优化评估指标。我们认为,直接优化评估指标可以积极提高错误定位方法的性能。因此,在本文中,我们利用强化学习(RL)技术来直接优化排名指标。我们提出了基于强化学习的错误定位方法 RLocator。我们使用马尔可夫决策过程(MDP)来制定 RLocator,以直接优化评估指标。我们介绍了该技术,并基于一个基准数据集对其进行了实验性评估,该数据集包含来自 6 个非常流行的 Apache 项目的 8,316 份错误报告。评估结果表明,RLocator 的平均互易等级 (MRR) 为 0.62,平均精度 (MAP) 为 0.59,前 1 名得分为 0.46。我们将 RLocator 与 FLIM、BugLocator 和 BL-GAN 这三种最先进的错误定位工具进行了比较。我们的评估结果表明,RLocator 的 MAP、MRR 和 Top K 指标分别提高了 38.3%、36.73% 和 23.68%,远远超过了这两种方法。这些发现突出表明,直接优化评估指标大大有助于提高错误定位问题的性能。
{"title":"RLocator: Reinforcement Learning for Bug Localization","authors":"Partha Chakraborty;Mahmoud Alfadel;Meiyappan Nagappan","doi":"10.1109/TSE.2024.3452595","DOIUrl":"10.1109/TSE.2024.3452595","url":null,"abstract":"Software developers spend a significant portion of time fixing bugs in their projects. To streamline this process, bug localization approaches have been proposed to identify the source code files that are likely responsible for a particular bug. Prior work proposed several similarity-based machine-learning techniques for bug localization. Despite significant advances in these techniques, they do not directly optimize the evaluation measures. We argue that directly optimizing evaluation measures can positively contribute to the performance of bug localization approaches. Therefore, in this paper, we utilize Reinforcement Learning (RL) techniques to directly optimize the ranking metrics. We propose \u0000<sc>RLocator</small>\u0000, a Reinforcement Learning-based bug localization approach. We formulate RLocator using a Markov Decision Process (MDP) to optimize the evaluation measures directly. We present the technique and experimentally evaluate it based on a benchmark dataset of 8,316 bug reports from six highly popular Apache projects. The results of our evaluation reveal that RLocator achieves a Mean Reciprocal Rank (MRR) of 0.62, a Mean Average Precision (MAP) of 0.59, and a Top 1 score of 0.46. We compare RLocator with three state-of-the-art bug localization tools, FLIM, BugLocator, and BL-GAN. Our evaluation reveals that RLocator outperforms both approaches by a substantial margin, with improvements of 38.3% in MAP, 36.73% in MRR, and 23.68% in the Top K metric. These findings highlight that directly optimizing evaluation measures considerably contributes to performance improvement of the bug localization problem.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 10","pages":"2695-2708"},"PeriodicalIF":6.5,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142101384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging Large Language Model for Automatic Patch Correctness Assessment 利用大型语言模型进行补丁正确性自动评估
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-30 DOI: 10.1109/TSE.2024.3452252
Xin Zhou;Bowen Xu;Kisub Kim;DongGyun Han;Hung Huu Nguyen;Thanh Le-Cong;Junda He;Bach Le;David Lo
Automated Program Repair (APR) techniques have shown more and more promising results in fixing real-world bugs. Despite the effectiveness, APR techniques still face an overfitting problem: a generated patch can be incorrect although it passes all tests. It is time-consuming to manually evaluate the correctness of generated patches that can pass all available test cases. To address this problem, many approaches have been proposed to automatically assess the correctness of patches generated by APR techniques. These approaches are mainly evaluated within the cross-validation setting. However, for patches generated by a new or unseen APR tool, users are implicitly required to manually label a significant portion of these patches (e.g., 90% in 10-fold cross-validation) in the cross-validation setting before inferring the remaining patches (e.g., 10% in 10-fold cross-validation). To mitigate the issue, in this study, we propose LLM4PatchCorrect, the patch correctness assessment by adopting a large language model for code. Specifically, for patches generated by a new or unseen APR tool, LLM4PatchCorrect does not need labeled patches of this new or unseen APR tool for training but directly queries the large language model for code to get predictions on the correctness labels without training. In this way, LLM4PatchCorrect can reduce the manual labeling effort when building a model to automatically assess the correctness of generated patches of new APR tools. To provide knowledge regarding the automatic patch correctness assessment (APCA) task to the large language model for code, LLM4PatchCorrect leverages bug descriptions, execution traces, failing test cases, test coverage, and labeled patches generated by existing APR tools, before deciding the correctness of the unlabeled patches of a new or unseen APR tool. Additionally, LLM4PatchCorrect prioritizes labeled patches from existing APR tools that exhibit semantic similarity to those generated by new APR tools, enhancing the accuracy achieved by LLM4PatchCorrect for patches from new APR tools. Our experimental results showed that LLM4PatchCorrect can achieve an accuracy of 84.4% and an F1-score of 86.5% on average although no labeled patch of the new or unseen APR tool is available. In addition, our proposed technique significantly outperformed the prior state-of-the-art.
自动程序修复(APR)技术在修复现实世界中的错误方面取得了越来越多的可喜成果。尽管效果显著,自动程序修复技术仍然面临过拟合问题:生成的补丁虽然通过了所有测试,但也可能是不正确的。要手动评估生成的补丁是否能通过所有可用的测试用例,非常耗时。为了解决这个问题,人们提出了许多方法来自动评估 APR 技术生成的补丁的正确性。这些方法主要在交叉验证环境中进行评估。但是,对于由新的或未见过的 APR 工具生成的补丁,用户需要在交叉验证设置中手动标注这些补丁中的很大一部分(例如,10 倍交叉验证中的 90%),然后才能推断出其余的补丁(例如,10 倍交叉验证中的 10%)。为了缓解这一问题,我们在本研究中提出了 LLM4PatchCorrect,即通过采用代码大语言模型来评估补丁的正确性。具体来说,对于由新的或未见过的 APR 工具生成的补丁,LLM4PatchCorrect 不需要该新的或未见过的 APR 工具的标记补丁进行训练,而是直接查询代码大语言模型,从而无需训练即可获得正确性标签预测。这样,LLM4PatchCorrect 就能在建立模型以自动评估新的 APR 工具生成补丁的正确性时,减少手动标记的工作量。为了向代码的大型语言模型提供有关自动补丁正确性评估(APCA)任务的知识,LLM4PatchCorrect 在决定新的或未见过的 APR 工具的未标记补丁的正确性之前,会利用错误描述、执行跟踪、失败测试用例、测试覆盖率和现有 APR 工具生成的已标记补丁。此外,LLM4PatchCorrect 还优先处理现有 APR 工具中与新 APR 工具生成的补丁具有语义相似性的标记补丁,从而提高了 LLM4PatchCorrect 对新 APR 工具补丁的准确性。我们的实验结果表明,尽管没有新的或未见过的 APR 工具的标注补丁,LLM4PatchCorrect 的准确率平均可达到 84.4%,F1 分数平均可达到 86.5%。此外,我们提出的技术明显优于之前的先进技术。
{"title":"Leveraging Large Language Model for Automatic Patch Correctness Assessment","authors":"Xin Zhou;Bowen Xu;Kisub Kim;DongGyun Han;Hung Huu Nguyen;Thanh Le-Cong;Junda He;Bach Le;David Lo","doi":"10.1109/TSE.2024.3452252","DOIUrl":"10.1109/TSE.2024.3452252","url":null,"abstract":"Automated Program Repair (APR) techniques have shown more and more promising results in fixing real-world bugs. Despite the effectiveness, APR techniques still face an overfitting problem: a generated patch can be incorrect although it passes all tests. It is time-consuming to manually evaluate the correctness of generated patches that can pass all available test cases. To address this problem, many approaches have been proposed to automatically assess the correctness of patches generated by APR techniques. These approaches are mainly evaluated within the cross-validation setting. However, for patches generated by a new or unseen APR tool, users are implicitly required to manually label a significant portion of these patches (e.g., 90% in 10-fold cross-validation) in the cross-validation setting before inferring the remaining patches (e.g., 10% in 10-fold cross-validation). To mitigate the issue, in this study, we propose \u0000<bold>LLM4PatchCorrect</b>\u0000, the patch correctness assessment by adopting a large language model for code. Specifically, for patches generated by a new or unseen APR tool, LLM4PatchCorrect does not need labeled patches of this new or unseen APR tool for training but directly queries the large language model for code to get predictions on the correctness labels without training. In this way, LLM4PatchCorrect can reduce the manual labeling effort when building a model to automatically assess the correctness of generated patches of new APR tools. To provide knowledge regarding the automatic patch correctness assessment (APCA) task to the large language model for code, LLM4PatchCorrect leverages bug descriptions, execution traces, failing test cases, test coverage, and labeled patches generated by existing APR tools, before deciding the correctness of the unlabeled patches of a new or unseen APR tool. Additionally, LLM4PatchCorrect prioritizes labeled patches from existing APR tools that exhibit semantic similarity to those generated by new APR tools, enhancing the accuracy achieved by LLM4PatchCorrect for patches from new APR tools. Our experimental results showed that LLM4PatchCorrect can achieve an accuracy of 84.4% and an F1-score of 86.5% on average although no labeled patch of the new or unseen APR tool is available. In addition, our proposed technique significantly outperformed the prior state-of-the-art.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"2865-2883"},"PeriodicalIF":6.5,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142101282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3Erefactor: Effective, Efficient and Executable Refactoring Recommendation for Software Architectural Consistency 3Erefactor:有效、高效、可执行的重构建议,实现软件架构一致性
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-28 DOI: 10.1109/TSE.2024.3449564
Jingwen Liu;Wuxia Jin;Junhui Zhou;Qiong Feng;Ming Fan;Haijun Wang;Ting Liu
As software continues to evolve and business functions become increasingly complex, architectural inconsistency arises when the implementation architecture deviates from the expected architecture design. This architectural problem makes maintenance difficult and requires significant effort to refactor. To assist labor-intensive refactoring, automated refactoring has received much attention such as searching for optimal refactoring solutions. However, there are still three limitations: The recommended refactorings are insufficiently effective in addressing architectural consistency; the search process for refactoring solution is inefficient; and there is a lack of executable refactoring solutions. To address these limitations, we propose an effective, efficient, and executable refactoring recommendation approach namely the 3Erefactor for software architectural consistency. To achieve effective refactoring, 3Erefactor uses NSGA-II to generate refactoring solutions that minimize architectural inconsistencies at module level and entity level. To achieve efficient refactoring, 3Erefactor leverages architecture recovery technique to locate files requiring refactoring, helping accelerate the convergence of refactoring algorithm. To achieve executable refactoring, 3Erefactor designs a set of refactoring executability constraint strategies during the refactoring solution search and generation, including improving refactoring pre-conditions and removing invalid operations in refactoring solutions. We evaluated our approach on six open source systems. Statistical analysis of our experiments shows that, the refactoring solution generated by 3Erefactor performed significantly better than 3 state-of-the-art approaches in terms of reducing the number of architectural inconsistencies, improving the efficiency of the refactoring algorithm and improving the executability of refactorings.
随着软件的不断发展和业务功能的日益复杂,当实施架构偏离预期的架构设计时,就会出现架构不一致的问题。这种架构问题给维护工作带来了困难,需要花费大量精力进行重构。为了协助劳动密集型重构,自动重构受到了广泛关注,如搜索最佳重构解决方案。然而,自动重构仍有三个局限性:推荐的重构在解决架构一致性方面不够有效;搜索重构解决方案的过程效率低下;缺乏可执行的重构解决方案。针对这些局限性,我们提出了一种有效、高效、可执行的重构推荐方法,即针对软件架构一致性的 3Erefactor。为了实现有效的重构,3Erefactor 使用 NSGA-II 生成重构解决方案,最大限度地减少模块级和实体级的架构不一致性。为实现高效重构,3Erefactor 利用架构恢复技术来定位需要重构的文件,从而帮助加快重构算法的收敛速度。为了实现可执行的重构,3Erefactor 在重构方案搜索和生成过程中设计了一套重构可执行性约束策略,包括改进重构前置条件和删除重构方案中的无效操作。我们在六个开源系统上评估了我们的方法。实验的统计分析表明,3Erefactor 生成的重构解决方案在减少架构不一致的数量、提高重构算法的效率和改善重构的可执行性方面的表现明显优于 3 种最先进的方法。
{"title":"3Erefactor: Effective, Efficient and Executable Refactoring Recommendation for Software Architectural Consistency","authors":"Jingwen Liu;Wuxia Jin;Junhui Zhou;Qiong Feng;Ming Fan;Haijun Wang;Ting Liu","doi":"10.1109/TSE.2024.3449564","DOIUrl":"10.1109/TSE.2024.3449564","url":null,"abstract":"As software continues to evolve and business functions become increasingly complex, architectural inconsistency arises when the implementation architecture deviates from the expected architecture design. This architectural problem makes maintenance difficult and requires significant effort to refactor. To assist labor-intensive refactoring, automated refactoring has received much attention such as searching for optimal refactoring solutions. However, there are still three limitations: The recommended refactorings are insufficiently effective in addressing architectural consistency; the search process for refactoring solution is inefficient; and there is a lack of executable refactoring solutions. To address these limitations, we propose an effective, efficient, and executable refactoring recommendation approach namely the 3Erefactor for software architectural consistency. To achieve effective refactoring, 3Erefactor uses NSGA-II to generate refactoring solutions that minimize architectural inconsistencies at module level and entity level. To achieve efficient refactoring, 3Erefactor leverages architecture recovery technique to locate files requiring refactoring, helping accelerate the convergence of refactoring algorithm. To achieve executable refactoring, 3Erefactor designs a set of refactoring executability constraint strategies during the refactoring solution search and generation, including improving refactoring pre-conditions and removing invalid operations in refactoring solutions. We evaluated our approach on six open source systems. Statistical analysis of our experiments shows that, the refactoring solution generated by 3Erefactor performed significantly better than 3 state-of-the-art approaches in terms of reducing the number of architectural inconsistencies, improving the efficiency of the refactoring algorithm and improving the executability of refactorings.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 10","pages":"2633-2655"},"PeriodicalIF":6.5,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142089992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Method-Level Test-to-Code Traceability Link Construction by Semantic Correlation Learning 通过语义相关性学习构建方法级测试到代码的可追溯性链接
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-27 DOI: 10.1109/TSE.2024.3449917
Weifeng Sun;Zhenting Guo;Meng Yan;Zhongxin Liu;Yan Lei;Hongyu Zhang
Test-to-code traceability links (TCTLs) establish links between test artifacts and code artifacts. These links enable developers and testers to quickly identify the specific pieces of code tested by particular test cases, thus facilitating more efficient debugging, regression testing, and maintenance activities. Various approaches, based on distinct concepts, have been proposed to establish method-level TCTLs, specifically linking unit tests to corresponding focal methods. Static methods, such as naming-convention-based methods, use heuristic- and similarity-based strategies. However, such methods face the following challenges: ① Developers, driven by specific scenarios and development requirements, may deviate from naming conventions, leading to TCTL identification failures. ② Static methods often overlook the rich semantics embedded within tests, leading to erroneous associations between tests and semantically unrelated code fragments. Although dynamic methods achieve promising results, they require the project to be compilable and the tests to be executable, limiting their usability. This limitation is significant for downstream tasks requiring massive test-code pairs, as not all projects can meet these requirements. To tackle the abovementioned limitations, we propose a novel static method-level TCTL approach, named TestLinker. For the first challenge of existing static approaches, TestLinker introduces a two-phase TCTL framework to accommodate different project types in a triage manner. As for the second challenge, we employ the semantic correlation learning, which learns and establishes the semantic correlations between tests and focal methods based on Pre-trained Code Models (PCMs). TestLinker further establishes mapping rules to accurately link the recommended function name to the concrete production function declaration. Empirical evaluation on a meticulously labeled dataset reveals that TestLinker significantly outperforms traditional static techniques, showing average F1-score improvements ranging from 73.48% to 202.00%. Moreover, compared to state-of-the-art dynamic methods, TestLinker, which only leverages static information, demonstrates comparable or even better performance, with an average F1-score increase of 37.40%.
测试到代码的可追溯性链接(TCTL)在测试工件和代码工件之间建立链接。这些链接使开发人员和测试人员能够快速识别特定测试用例所测试的特定代码片段,从而促进更高效的调试、回归测试和维护活动。基于不同的概念,人们提出了各种方法来建立方法级 TCTL,特别是将单元测试与相应的焦点方法联系起来。静态方法,如基于命名规则的方法,使用基于启发式和相似性的策略。然而,这类方法面临以下挑战:开发人员在特定场景和开发要求的驱动下,可能会偏离命名约定,导致 TCTL 识别失败。静态方法往往会忽略测试中蕴含的丰富语义,导致测试与语义无关的代码片段之间产生错误关联。尽管动态方法取得了可喜的成果,但它们要求项目可编译、测试可执行,从而限制了其可用性。对于需要大量测试代码对的下游任务来说,这一限制非常重要,因为并非所有项目都能满足这些要求。为了解决上述限制,我们提出了一种新颖的静态方法级 TCTL 方法,命名为 TestLinker。针对现有静态方法面临的第一个挑战,TestLinker 引入了两阶段 TCTL 框架,以分流方式适应不同的项目类型。针对第二个挑战,我们采用了语义相关性学习方法,即基于预训练代码模型(PCM)学习并建立测试与焦点方法之间的语义相关性。TestLinker 还进一步建立了映射规则,以准确地将推荐函数名称与具体的生产函数声明联系起来。在精心标注的数据集上进行的经验评估显示,TestLinker 的性能明显优于传统的静态技术,平均 F1 分数提高了 73.48% 到 202.00%。此外,与最先进的动态方法相比,仅利用静态信息的 TestLinker 表现出了相当甚至更好的性能,平均 F1 分数提高了 37.40%。
{"title":"Method-Level Test-to-Code Traceability Link Construction by Semantic Correlation Learning","authors":"Weifeng Sun;Zhenting Guo;Meng Yan;Zhongxin Liu;Yan Lei;Hongyu Zhang","doi":"10.1109/TSE.2024.3449917","DOIUrl":"10.1109/TSE.2024.3449917","url":null,"abstract":"Test-to-code traceability links (TCTLs) establish links between test artifacts and code artifacts. These links enable developers and testers to quickly identify the specific pieces of code tested by particular test cases, thus facilitating more efficient debugging, regression testing, and maintenance activities. Various approaches, based on distinct concepts, have been proposed to establish method-level TCTLs, specifically linking unit tests to corresponding focal methods. Static methods, such as naming-convention-based methods, use heuristic- and similarity-based strategies. However, such methods face the following challenges: ① Developers, driven by specific scenarios and development requirements, may deviate from naming conventions, leading to TCTL identification failures. ② Static methods often overlook the rich semantics embedded within tests, leading to erroneous associations between tests and semantically unrelated code fragments. Although dynamic methods achieve promising results, they require the project to be compilable and the tests to be executable, limiting their usability. This limitation is significant for downstream tasks requiring massive test-code pairs, as not all projects can meet these requirements. To tackle the abovementioned limitations, we propose a novel static method-level TCTL approach, named \u0000<sc>TestLinker</small>\u0000. For the first challenge of existing static approaches, \u0000<sc>TestLinker</small>\u0000 introduces a two-phase TCTL framework to accommodate different project types in a triage manner. As for the second challenge, we employ the \u0000<italic>semantic correlation learning</i>\u0000, which learns and establishes the semantic correlations between tests and focal methods based on Pre-trained Code Models (PCMs). \u0000<sc>TestLinker</small>\u0000 further establishes mapping rules to accurately link the recommended function name to the concrete production function declaration. Empirical evaluation on a meticulously labeled dataset reveals that \u0000<sc>TestLinker</small>\u0000 significantly outperforms traditional static techniques, showing average F1-score improvements ranging from 73.48% to 202.00%. Moreover, compared to state-of-the-art dynamic methods, \u0000<sc>TestLinker</small>\u0000, which only leverages static information, demonstrates comparable or even better performance, with an average F1-score increase of 37.40%.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 10","pages":"2656-2676"},"PeriodicalIF":6.5,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142085527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Follow-Up Attention: An Empirical Study of Developer and Neural Model Code Exploration 后续关注:开发人员和神经模型代码探索的实证研究
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-23 DOI: 10.1109/TSE.2024.3445338
Matteo Paltenghi;Rahul Pandita;Austin Z. Henley;Albert Ziegler
Recent neural models of code, such as OpenAI Codex and AlphaCode, have demonstrated remarkable proficiency at code generation due to the underlying attention mechanism. However, it often remains unclear how the models actually process code, and to what extent their reasoning and the way their attention mechanism scans the code matches the patterns of developers. A poor understanding of the model reasoning process limits the way in which current neural models are leveraged today, so far mostly for their raw prediction. To fill this gap, this work studies how the processed attention signal of three open large language models - CodeGen, InCoder and GPT-J - agrees with how developers look at and explore code when each answers the same sensemaking questions about code. Furthermore, we contribute an open-source eye-tracking dataset comprising 92 manually-labeled sessions from 25 developers engaged in sensemaking tasks. We empirically evaluate five heuristics that do not use the attention and ten attention-based post-processing approaches of the attention signal of CodeGen against our ground truth of developers exploring code, including the novel concept of follow-up attention which exhibits the highest agreement between model and human attention. Our follow-up attention method can predict the next line a developer will look at with 47% accuracy. This outperforms the baseline prediction accuracy of 42.3%, which uses the session history of other developers to recommend the next line. These results demonstrate the potential of leveraging the attention signal of pre-trained models for effective code exploration.
最近的代码神经模型,如 OpenAI Codex 和 AlphaCode,由于采用了底层注意力机制,在代码生成方面表现出了非凡的能力。然而,人们往往仍不清楚这些模型究竟是如何处理代码的,也不清楚它们的推理及其注意力机制扫描代码的方式在多大程度上与开发人员的模式相匹配。对模型推理过程的不甚了解,限制了当前神经模型的利用方式,迄今为止,人们主要利用神经模型进行原始预测。为了填补这一空白,这项工作研究了 CodeGen、InCoder 和 GPT-J 这三种开放式大型语言模型处理后的注意力信号与开发人员在回答关于代码的相同感知问题时查看和探索代码的方式的一致性。此外,我们还提供了一个开源眼动跟踪数据集,该数据集包含来自 25 位参与感知创建任务的开发人员的 92 个手动标记会话。我们根据开发人员探索代码的基本事实,对 CodeGen 的五种不使用注意力的启发式方法和十种基于注意力的注意力信号后处理方法进行了实证评估,其中包括新颖的后续注意力概念,它在模型和人类注意力之间表现出最高的一致性。我们的跟进注意力方法能以 47% 的准确率预测开发人员将查看的下一行。这比使用其他开发人员的会话历史来推荐下一行的 42.3% 的基准预测准确率要高。这些结果证明了利用预训练模型的注意力信号进行有效代码探索的潜力。
{"title":"Follow-Up Attention: An Empirical Study of Developer and Neural Model Code Exploration","authors":"Matteo Paltenghi;Rahul Pandita;Austin Z. Henley;Albert Ziegler","doi":"10.1109/TSE.2024.3445338","DOIUrl":"10.1109/TSE.2024.3445338","url":null,"abstract":"Recent neural models of code, such as OpenAI Codex and AlphaCode, have demonstrated remarkable proficiency at code generation due to the underlying attention mechanism. However, it often remains unclear how the models actually process code, and to what extent their reasoning and the way their attention mechanism scans the code matches the patterns of developers. A poor understanding of the model reasoning process limits the way in which current neural models are leveraged today, so far mostly for their raw prediction. To fill this gap, this work studies how the processed attention signal of three open large language models - CodeGen, InCoder and GPT-J - agrees with how developers look at and explore code when each answers the same sensemaking questions about code. Furthermore, we contribute an open-source eye-tracking dataset comprising 92 manually-labeled sessions from 25 developers engaged in sensemaking tasks. We empirically evaluate five heuristics that do not use the attention and ten attention-based post-processing approaches of the attention signal of CodeGen against our ground truth of developers exploring code, including the novel concept of \u0000<italic>follow-up attention</i>\u0000 which exhibits the highest agreement between model and human attention. Our follow-up attention method can predict the next line a developer will look at with 47% accuracy. This outperforms the baseline prediction accuracy of 42.3%, which uses the session history of other developers to recommend the next line. These results demonstrate the potential of leveraging the attention signal of pre-trained models for effective code exploration.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 10","pages":"2568-2582"},"PeriodicalIF":6.5,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10645745","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142045621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EpiTESTER: Testing Autonomous Vehicles With Epigenetic Algorithm and Attention Mechanism EpiTESTER:利用表观遗传算法和注意力机制测试自动驾驶汽车
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-23 DOI: 10.1109/TSE.2024.3449429
Chengjie Lu;Shaukat Ali;Tao Yue
Testing autonomous vehicles (AVs) under various environmental scenarios that lead the vehicles to unsafe situations is challenging. Given the infinite possible environmental scenarios, it is essential to find critical scenarios efficiently. To this end, we propose a novel testing method, named EpiTESTER, by taking inspiration from epigenetics, which enables species to adapt to sudden environmental changes. In particular, EpiTESTER adopts gene silencing as its epigenetic mechanism, which regulates gene expression to prevent the expression of a certain gene, and the probability of gene expression is dynamically computed as the environment changes. Given different data modalities (e.g., images, lidar point clouds) in the context of AV, EpiTESTER benefits from a multi-model fusion transformer to extract high-level feature representations from environmental factors. Next, it calculates probabilities based on these features with the attention mechanism. To assess the cost-effectiveness of EpiTESTER, we compare it with a probabilistic search algorithm (Simulated Annealing, SA), a classical genetic algorithm (GA) (i.e., without any epigenetic mechanism implemented), and EpiTESTER with equal probability for each gene. We evaluate EpiTESTER with six initial environments from CARLA, an open-source simulator for autonomous driving research, and two end-to-end AV controllers, Interfuser and TCP. Our results show that EpiTESTER achieved a promising performance in identifying critical scenarios compared to the baselines, showing that applying epigenetic mechanisms is a good option for solving practical problems.
在各种环境场景下测试自动驾驶汽车(AVs)是否会导致车辆出现不安全状况是一项挑战。鉴于可能出现的环境场景无穷无尽,因此必须高效地找到关键场景。为此,我们从表观遗传学中汲取灵感,提出了一种名为 EpiTESTER 的新型测试方法。具体而言,EpiTESTER 采用基因沉默作为其表观遗传学机制,通过调节基因表达来阻止某个基因的表达,并随着环境的变化动态计算基因表达的概率。鉴于视听环境中存在不同的数据模式(如图像、激光雷达点云),EpiTESTER 利用多模式融合转换器从环境因素中提取高级特征表征。然后,它利用注意力机制根据这些特征计算概率。为了评估 EpiTESTER 的成本效益,我们将其与概率搜索算法(模拟退火算法,SA)、经典遗传算法(GA)(即未实施任何表观遗传机制)以及每个基因概率相同的 EpiTESTER 进行了比较。我们用 CARLA(一个用于自动驾驶研究的开源模拟器)中的六个初始环境以及 Interfuser 和 TCP 这两个端到端 AV 控制器对 EpiTESTER 进行了评估。我们的结果表明,与基线相比,EpiTESTER 在识别关键场景方面取得了可喜的成绩,这表明应用表观遗传机制是解决实际问题的良好选择。
{"title":"EpiTESTER: Testing Autonomous Vehicles With Epigenetic Algorithm and Attention Mechanism","authors":"Chengjie Lu;Shaukat Ali;Tao Yue","doi":"10.1109/TSE.2024.3449429","DOIUrl":"10.1109/TSE.2024.3449429","url":null,"abstract":"Testing autonomous vehicles (AVs) under various environmental scenarios that lead the vehicles to unsafe situations is challenging. Given the infinite possible environmental scenarios, it is essential to find critical scenarios efficiently. To this end, we propose a novel testing method, named \u0000<italic>EpiTESTER</i>\u0000, by taking inspiration from epigenetics, which enables species to adapt to sudden environmental changes. In particular, \u0000<italic>EpiTESTER</i>\u0000 adopts gene silencing as its epigenetic mechanism, which regulates gene expression to prevent the expression of a certain gene, and the probability of gene expression is dynamically computed as the environment changes. Given different data modalities (e.g., images, lidar point clouds) in the context of AV, \u0000<italic>EpiTESTER</i>\u0000 benefits from a multi-model fusion transformer to extract high-level feature representations from environmental factors. Next, it calculates probabilities based on these features with the attention mechanism. To assess the cost-effectiveness of \u0000<italic>EpiTESTER</i>\u0000, we compare it with a probabilistic search algorithm (Simulated Annealing, SA), a classical genetic algorithm (GA) (i.e., without any epigenetic mechanism implemented), and \u0000<italic>EpiTESTER</i>\u0000 with equal probability for each gene. We evaluate \u0000<italic>EpiTESTER</i>\u0000 with six initial environments from CARLA, an open-source simulator for autonomous driving research, and two end-to-end AV controllers, Interfuser and TCP. Our results show that \u0000<italic>EpiTESTER</i>\u0000 achieved a promising performance in identifying critical scenarios compared to the baselines, showing that applying epigenetic mechanisms is a good option for solving practical problems.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 10","pages":"2614-2632"},"PeriodicalIF":6.5,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142045638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Software Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1