首页 > 最新文献

ACM Transactions on Software Engineering and Methodology最新文献

英文 中文
A Tale of Two Comprehensions? Analyzing Student Programmer Attention during Code Summarization 两种理解的故事?分析学生程序员在总结代码时的注意力
IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-15 DOI: 10.1145/3664808
Zachary Karas, Aakash Bansal, Yifan Zhang, Toby Li, Collin McMillan, Yu Huang

Code summarization is the task of creating short, natural language descriptions of source code. It is an important part of code comprehension, and a powerful method of documentation. Previous work has made progress in identifying where programmers focus in code as they write their own summaries (i.e. writing). However, there is currently a gap studying programmers’ attention as they read code with pre-written summaries (i.e., reading). As a result, it is currently unknown how these two forms of code comprehension compare: reading and writing. Also, there is a limited understanding of programmer attention in code summarization with respect to program semantics. We address these gaps in this paper with a human eye-tracking study (n = 27) comparing reading and writing. We examined programmer attention with respect to fine-grained program semantics, including their attention sequence (i.e., scan path). We find distinctions in programmer attention between the comprehension tasks, similarities in reading patterns between them, and differences mediated by expertise. Furthermore, we mapped programmers’ gaze data onto the Abstract Syntax Tree (AST) to explore another representation of human attention. Some significant differences in programmer attention on the raw code are not significant on the AST, while others are more significant.

代码摘要是为源代码创建简短的自然语言描述的任务。它是代码理解的重要组成部分,也是一种强大的文档编制方法。以前的工作在确定程序员在编写自己的摘要(即写作)时的代码重点方面取得了进展。然而,目前在研究程序员阅读预先写好摘要的代码(即阅读)时的注意力方面还存在空白。因此,目前还不清楚阅读和书写这两种代码理解方式的比较情况。此外,人们对程序员在总结代码时关注程序语义的理解也很有限。我们在本文中通过一项人类眼动跟踪研究(n = 27)对阅读和写作进行了比较,从而弥补了这些不足。我们考察了程序员在细粒度程序语义方面的注意力,包括他们的注意力顺序(即扫描路径)。我们发现,程序员的注意力在不同的理解任务之间存在差异,他们之间的阅读模式也有相似之处,而且差异还受到专业知识的影响。此外,我们还将程序员的注视数据映射到抽象语法树(AST)上,以探索人类注意力的另一种表现形式。在原始代码上,程序员注意力的一些显著差异在 AST 上并不明显,而其他一些差异则更为明显。
{"title":"A Tale of Two Comprehensions? Analyzing Student Programmer Attention during Code Summarization","authors":"Zachary Karas, Aakash Bansal, Yifan Zhang, Toby Li, Collin McMillan, Yu Huang","doi":"10.1145/3664808","DOIUrl":"https://doi.org/10.1145/3664808","url":null,"abstract":"<p>Code summarization is the task of creating short, natural language descriptions of source code. It is an important part of code comprehension, and a powerful method of documentation. Previous work has made progress in identifying where programmers focus in code as they write their own summaries (i.e. writing). However, there is currently a gap studying programmers’ attention as they read code with pre-written summaries (i.e., reading). As a result, it is currently unknown how these two forms of code comprehension compare: reading and writing. Also, there is a limited understanding of programmer attention in code summarization with respect to program semantics. We address these gaps in this paper with a human eye-tracking study (n = 27) comparing reading and writing. We examined programmer attention with respect to fine-grained program semantics, including their attention sequence (i.e., scan path). We find distinctions in programmer attention between the comprehension tasks, similarities in reading patterns between them, and differences mediated by expertise. Furthermore, we mapped programmers’ gaze data onto the Abstract Syntax Tree (AST) to explore another representation of human attention. Some significant differences in programmer attention on the raw code are not significant on the AST, while others are more significant.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"29 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Formal Explainer for Just-In-Time Defect Predictions 及时缺陷预测的形式解释器
IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-14 DOI: 10.1145/3664809
Jinqiang Yu, Michael Fu, Alexey Ignatiev, Chakkrit Tantithamthavorn, Peter Stuckey

Just-In-Time (JIT) defect prediction has been proposed to help teams to prioritize the limited resources on the most risky commits (or pull requests), yet it remains largely a black-box, whose predictions are not explainable nor actionable to practitioners. Thus, prior studies have applied various model-agnostic techniques to explain the predictions of JIT models. Yet, explanations generated from existing model-agnostic techniques are still not formally sound, robust, and actionable. In this paper, we propose FoX, a Formal eXplainer for JIT Defect Prediction, which builds on formal reasoning about the behaviour of JIT defect prediction models and hence is able to provide provably correct explanations, which are additionally guaranteed to be minimal. Our experimental results show that FoX is able to efficiently generate provably-correct, robust, and actionable explanations while existing model-agnostic techniques cannot. Our survey study with 54 software practitioners provides valuable insights into the usefulness and trustworthiness of our FoX approach. 86% of participants agreed that our approach is useful, while 74% of participants found it trustworthy. Thus, this paper serves as an important stepping stone towards trustable explanations for JIT models to help domain experts and practitioners better understand why a commit is predicted as defective and what to do to mitigate the risk.

即时缺陷预测(JIT)被提出来帮助团队将有限的资源优先用于风险最大的提交(或拉取请求),但它在很大程度上仍是一个黑箱,其预测结果对实践者来说既无法解释,也无法操作。因此,之前的研究采用了各种与模型无关的技术来解释 JIT 模型的预测。然而,现有的模型无关技术所产生的解释在形式上仍然不够合理、稳健和可操作。在本文中,我们提出了用于 JIT 缺陷预测的形式化解释器 FoX,它建立在对 JIT 缺陷预测模型行为的形式化推理基础之上,因此能够提供可证明的正确解释,而且还能保证其最小化。我们的实验结果表明,FoX 能够高效地生成可证明正确、稳健和可操作的解释,而现有的与模型无关的技术则无法做到这一点。我们对 54 名软件从业人员进行的调查研究为我们的 FoX 方法的实用性和可信度提供了宝贵的见解。86% 的参与者认为我们的方法有用,74% 的参与者认为我们的方法值得信赖。因此,本文是实现对 JIT 模型进行可信解释的重要基石,可帮助领域专家和从业人员更好地理解为什么一项提交会被预测为有缺陷,以及如何降低风险。
{"title":"A Formal Explainer for Just-In-Time Defect Predictions","authors":"Jinqiang Yu, Michael Fu, Alexey Ignatiev, Chakkrit Tantithamthavorn, Peter Stuckey","doi":"10.1145/3664809","DOIUrl":"https://doi.org/10.1145/3664809","url":null,"abstract":"<p>Just-In-Time (JIT) defect prediction has been proposed to help teams to prioritize the limited resources on the most risky commits (or pull requests), yet it remains largely a black-box, whose predictions are not explainable nor actionable to practitioners. Thus, prior studies have applied various model-agnostic techniques to explain the predictions of JIT models. Yet, explanations generated from existing model-agnostic techniques are still not formally sound, robust, and actionable. In this paper, we propose <span>FoX</span>, a <underline>Fo</underline>rmal e<underline>X</underline>plainer for JIT Defect Prediction, which builds on formal reasoning about the behaviour of JIT defect prediction models and hence is able to provide provably correct explanations, which are additionally guaranteed to be minimal. Our experimental results show that <span>FoX</span> is able to efficiently generate provably-correct, robust, and actionable explanations while existing model-agnostic techniques cannot. Our survey study with 54 software practitioners provides valuable insights into the usefulness and trustworthiness of our <span>FoX</span> approach. 86% of participants agreed that our approach is useful, while 74% of participants found it trustworthy. Thus, this paper serves as an important stepping stone towards trustable explanations for JIT models to help domain experts and practitioners better understand why a commit is predicted as defective and what to do to mitigate the risk.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"11 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mobile Application Online Cross-Project Just-in-Time Software Defect Prediction Framework 移动应用在线跨项目及时软件缺陷预测框架
IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-14 DOI: 10.1145/3664607
Siyu Jiang, Zhenhang He, Yuwen Chen, Mingrong Zhang, Le Ma

As mobile applications evolve rapidly, their fast iterative update nature leads to an increase in software defects. Just-In-Time Software Defect Prediction (JIT-SDP) offers immediate feedback on code changes. For new applications without historical data, researchers have proposed Cross-Project JIT-SDP (CP JIT-SDP). Existing CP JIT-SDP approaches are designed for offline scenarios where target data is available in advance. However, target data in real-world applications usually arrives online in a streaming manner, making online CP JIT-SDP face cross-project distribution differences and target project data concept drift challenges in online scenarios. These challenges often co-exist during application development, and their interactions cause model performance to degrade. To address these issues, we propose an online CP JIT-SDP framework called COTL. Specifically, COTL consists of two stages: offline and online. In offline stage, the cross-domain structure preserving projection algorithm is used to reduce the cross-project distribution differences. In online stage, target data arrives sequentially over time. By reducing the differences in marginal and conditional distributions between offline and online data for target project, concept drift is mitigated and classifier weights are updated online. Experimental results on 15 mobile application benchmark datasets show that COTL outperforms 13 benchmark methods on four performance metrics.

由于移动应用程序发展迅速,其快速迭代更新的特性导致软件缺陷增加。即时软件缺陷预测(JIT-SDP)可对代码变更提供即时反馈。对于没有历史数据的新应用,研究人员提出了跨项目 JIT-SDP(CP JIT-SDP)。现有的 CP JIT-SDP 方法是为离线场景设计的,在离线场景中,目标数据是提前可用的。然而,实际应用中的目标数据通常以流式方式在线到达,这使得在线 CP JIT-SDP 在在线场景中面临跨项目分布差异和目标项目数据概念漂移的挑战。在应用程序开发过程中,这些挑战往往同时存在,它们之间的相互作用会导致模型性能下降。为了解决这些问题,我们提出了一个名为 COTL 的在线 CP JIT-SDP 框架。具体来说,COTL 包括两个阶段:离线和在线。在离线阶段,使用跨域结构保留投影算法来减少跨项目分布差异。在在线阶段,目标数据随着时间的推移依次到达。通过减少目标项目离线数据和在线数据在边际分布和条件分布上的差异,概念漂移得以缓解,分类器权重也会在线更新。在 15 个移动应用基准数据集上的实验结果表明,COTL 在四个性能指标上优于 13 种基准方法。
{"title":"Mobile Application Online Cross-Project Just-in-Time Software Defect Prediction Framework","authors":"Siyu Jiang, Zhenhang He, Yuwen Chen, Mingrong Zhang, Le Ma","doi":"10.1145/3664607","DOIUrl":"https://doi.org/10.1145/3664607","url":null,"abstract":"<p>As mobile applications evolve rapidly, their fast iterative update nature leads to an increase in software defects. Just-In-Time Software Defect Prediction (JIT-SDP) offers immediate feedback on code changes. For new applications without historical data, researchers have proposed Cross-Project JIT-SDP (CP JIT-SDP). Existing CP JIT-SDP approaches are designed for offline scenarios where target data is available in advance. However, target data in real-world applications usually arrives online in a streaming manner, making online CP JIT-SDP face cross-project distribution differences and target project data concept drift challenges in online scenarios. These challenges often co-exist during application development, and their interactions cause model performance to degrade. To address these issues, we propose an online CP JIT-SDP framework called COTL. Specifically, COTL consists of two stages: offline and online. In offline stage, the cross-domain structure preserving projection algorithm is used to reduce the cross-project distribution differences. In online stage, target data arrives sequentially over time. By reducing the differences in marginal and conditional distributions between offline and online data for target project, concept drift is mitigated and classifier weights are updated online. Experimental results on 15 mobile application benchmark datasets show that COTL outperforms 13 benchmark methods on four performance metrics.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"151 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LLMEffiChecker:Understanding and Testing Efficiency Degradation of Large Language Models LLMEffiChecker:理解和测试大型语言模型的效率衰减
IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-13 DOI: 10.1145/3664812
Xiaoning Feng, Xiaohong Han, Simin Chen, Wei Yang

Large Language Models (LLMs) have received much recent attention due to their human-level accuracy. While existing works mostly focus on either improving accuracy or testing accuracy robustness, the computation efficiency of LLMs, which is of paramount importance due to often vast generation demands and real-time requirements, has surprisingly received little attention. In this paper, we make the first attempt to understand and test potential computation efficiency robustness in state-of-the-art LLMs. By analyzing the working mechanism and implementation of 20,543 public-accessible LLMs, we observe a fundamental property in LLMs that could be manipulated in an adversarial manner to reduce computation efficiency significantly. Our interesting observation is that the output length determines the computation efficiency of LLMs instead of the input, where the output length depends on two factors: an often sufficiently large yet pessimistic pre-configured threshold controlling the max number of iterations and a runtime generated end of sentence (EOS) token. Our key motivation is to generate test inputs that could sufficiently delay the generation of EOS such that LLMs would have to go through enough iterations to satisfy the pre-configured threshold. We present LLMEffiChecker, which can work under both white-box setting and black-box setting. In the white-box scenario, LLMEffiChecker develops a gradient-guided technique that searches for a minimal and unnoticeable perturbation at character-level, token-level, and structure-level. In the black-box scenario, LLMEffiChecker employs a causal inference-based approach to find critical tokens and similarly applies three levels of imperceptible perturbation to them. Both the white-box and black-box settings effectively delay the appearance of EOS, compelling these inputs to reach the naturally-unreachable threshold. To demonstrate the effectiveness of LLMEffiChecker, we conduct a systematic evaluation on nine public-available LLMs: Google T5, AllenAI WMT14, Helsinki-NLP translator, Facebook FairSeq, UNICAMP-DL translator, MarianMT, Google FLAN-T5, MBZUAI LaMini-GPT and Salesforce CodeGen. Experimental results show that LLMEffiChecker can increase on average LLMs’ response latency and energy consumption by 325% to 3244% and 344% to 3616%, respectively, by perturbing just one character or token in the input sentence. Our case study shows that inputs generated by LLMEffiChecker significantly affect the battery power in real-world mobile devices (i.e., drain more than 30 times battery power than normal inputs).

大型语言模型(LLM)因其人类水平的准确性而受到广泛关注。现有的研究大多集中在提高准确性或测试准确性的鲁棒性上,而 LLM 的计算效率却出人意料地很少受到关注,而计算效率往往因庞大的生成需求和实时性要求而至关重要。在本文中,我们首次尝试了解和测试最先进 LLM 潜在的计算效率鲁棒性。通过分析 20,543 个可公开访问的 LLM 的工作机制和实现,我们观察到了 LLM 的一个基本特性,它可以被恶意操纵,从而显著降低计算效率。我们有趣的发现是,输出长度而非输入决定了 LLM 的计算效率,其中输出长度取决于两个因素:一个通常足够大但却很悲观的预设阈值(控制最大迭代次数)和一个运行时生成的句末标记(EOS)。我们的主要动机是生成能够充分延迟 EOS 生成的测试输入,这样 LLM 就必须经过足够多的迭代才能满足预先配置的阈值。我们提出的 LLMEffiChecker 可以在白盒和黑盒环境下工作。在白箱环境下,LLMEffiChecker 开发了一种梯度引导技术,可在字符级、标记级和结构级搜索最小且不易察觉的扰动。在黑盒方案中,LLMEffiChecker 采用基于因果推理的方法来查找关键标记,并同样对它们应用三个不易察觉的扰动级别。白盒和黑盒设置都能有效延迟 EOS 的出现,迫使这些输入达到自然无法达到的阈值。为了证明 LLMEffiChecker 的有效性,我们对九个公开的 LLM 进行了系统评估:谷歌 T5、AllenAI WMT14、赫尔辛基-NLP 翻译器、Facebook FairSeq、UNICAMP-DL 翻译器、MarianMT、谷歌 FLAN-T5、MBZUAI LaMini-GPT 和 Salesforce CodeGen。实验结果表明,LLMEffiChecker 只需扰动输入句子中的一个字符或标记,就能将 LLM 的响应延迟和能耗平均分别提高 325% 至 3244%,以及 344% 至 3616%。我们的案例研究表明,LLMEffiChecker 生成的输入会严重影响实际移动设备的电池电量(即耗电量是正常输入的 30 倍以上)。
{"title":"LLMEffiChecker:Understanding and Testing Efficiency Degradation of Large Language Models","authors":"Xiaoning Feng, Xiaohong Han, Simin Chen, Wei Yang","doi":"10.1145/3664812","DOIUrl":"https://doi.org/10.1145/3664812","url":null,"abstract":"<p>Large Language Models (LLMs) have received much recent attention due to their human-level accuracy. While existing works mostly focus on either improving accuracy or testing accuracy robustness, the computation efficiency of LLMs, which is of paramount importance due to often vast generation demands and real-time requirements, has surprisingly received little attention. In this paper, we make the first attempt to understand and test potential computation efficiency robustness in state-of-the-art LLMs. By analyzing the working mechanism and implementation of 20,543 public-accessible LLMs, we observe a fundamental property in LLMs that could be manipulated in an adversarial manner to reduce computation efficiency significantly. Our interesting observation is that the output length determines the computation efficiency of LLMs instead of the input, where the output length depends on two factors: an often sufficiently large yet pessimistic pre-configured threshold controlling the max number of iterations and a runtime generated end of sentence (EOS) token. Our key motivation is to generate test inputs that could sufficiently delay the generation of EOS such that LLMs would have to go through enough iterations to satisfy the pre-configured threshold. We present <monospace>LLMEffiChecker</monospace>, which can work under both white-box setting and black-box setting. In the white-box scenario, <monospace>LLMEffiChecker</monospace> develops a gradient-guided technique that searches for a minimal and unnoticeable perturbation at character-level, token-level, and structure-level. In the black-box scenario, <monospace>LLMEffiChecker</monospace> employs a causal inference-based approach to find critical tokens and similarly applies three levels of imperceptible perturbation to them. Both the white-box and black-box settings effectively delay the appearance of EOS, compelling these inputs to reach the naturally-unreachable threshold. To demonstrate the effectiveness of <monospace>LLMEffiChecker</monospace>, we conduct a systematic evaluation on nine public-available LLMs: Google T5, AllenAI WMT14, Helsinki-NLP translator, Facebook FairSeq, UNICAMP-DL translator, MarianMT, Google FLAN-T5, MBZUAI LaMini-GPT and Salesforce CodeGen. Experimental results show that <monospace>LLMEffiChecker</monospace> can increase on average LLMs’ response latency and energy consumption by 325% to 3244% and 344% to 3616%, respectively, by perturbing just one character or token in the input sentence. Our case study shows that inputs generated by <monospace>LLMEffiChecker</monospace> significantly affect the battery power in real-world mobile devices (<i>i.e.</i>, drain more than 30 times battery power than normal inputs).</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"7 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Testing Updated Apps by Adapting Learned Models 通过调整所学模型测试更新的应用程序
IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-13 DOI: 10.1145/3664601
Chanh Duc Ngo, Fabrizio Pastore, Lionel Briand

Although App updates are frequent and software engineers would like to verify updated features only, automated testing techniques verify entire Apps and are thus wasting resources.

We present Continuous Adaptation of Learned Models (CALM), an automated App testing approach that efficiently test App updates by adapting App models learned when automatically testing previous App versions. CALM focuses on functional testing. Since functional correctness can be mainly verified through the visual inspection of App screens, CALM minimizes the number of App screens to be visualized by software testers while maximizing the percentage of updated methods and instructions exercised.

Our empirical evaluation shows that CALM exercises a significantly higher proportion of updated methods and instructions than six state-of-the-art approaches, for the same maximum number of App screens to be visually inspected. Further, in common update scenarios, where only a small fraction of methods are updated, CALM is even quicker to outperform all competing approaches in a more significant way.

虽然应用程序更新频繁,软件工程师希望只验证更新的功能,但自动测试技术却要验证整个应用程序,因此浪费了资源。我们提出了 "学习模型的持续适应"(CALM),这是一种自动化应用程序测试方法,通过适应自动测试以前版本应用程序时学习到的应用程序模型,有效地测试应用程序更新。CALM 专注于功能测试。由于功能的正确性主要可以通过对应用程序屏幕的可视化检查来验证,因此 CALM 可以最大限度地减少软件测试人员需要可视化的应用程序屏幕数量,同时最大限度地提高更新方法和指令的使用比例。我们的实证评估表明,与六种最先进的方法相比,在需要目测的最大应用程序屏幕数量相同的情况下,CALM 所使用的更新方法和指令的比例要高得多。此外,在常见的更新场景中,只有一小部分方法会被更新,而 CALM 甚至能更快更显著地超越所有竞争方法。
{"title":"Testing Updated Apps by Adapting Learned Models","authors":"Chanh Duc Ngo, Fabrizio Pastore, Lionel Briand","doi":"10.1145/3664601","DOIUrl":"https://doi.org/10.1145/3664601","url":null,"abstract":"<p>Although App updates are frequent and software engineers would like to verify updated features only, automated testing techniques verify entire Apps and are thus wasting resources. </p><p>We present <i>Continuous Adaptation of Learned Models (CALM)</i>, an automated App testing approach that efficiently test App updates by adapting App models learned when automatically testing previous App versions. CALM focuses on functional testing. Since functional correctness can be mainly verified through the visual inspection of App screens, CALM minimizes the number of App screens to be visualized by software testers while maximizing the percentage of updated methods and instructions exercised. </p><p>Our empirical evaluation shows that CALM exercises a significantly higher proportion of updated methods and instructions than six state-of-the-art approaches, for the same maximum number of App screens to be visually inspected. Further, in common update scenarios, where only a small fraction of methods are updated, CALM is even quicker to outperform all competing approaches in a more significant way.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"1 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What Makes a Good TODO Comment? 怎样才能写好 TODO 评论?
IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-13 DOI: 10.1145/3664811
Haoye Wang, Zhipeng Gao, Tingting Bi, John Grundy, Xinyu Wang, Minghui Wu, Xiaohu Yang

Software development is a collaborative process that involves various interactions among individuals and teams. TODO comments in source code play a critical role in managing and coordinating diverse tasks during this process. However, this study finds that a large proportion of open-source project TODO comments are left unresolved or take a long time to be resolved. About 46.7% of TODO comments in open-source repositories are of low-quality (e.g., TODOs that are ambiguous, lack information, or are useless to developers). This highlights the need for better TODO practices. In this study, we investigate four aspects regarding the quality of TODO comments in open-source projects: (1) the prevalence of low-quality TODO comments; (2) the key characteristics of high-quality TODO comments; (3) how are TODO comments of different quality managed in practice; and (4) the feasibility of automatically assessing TODO comment quality. Examining 2,863 TODO comments from Top100 GitHub Java repositories, we propose criteria to identify high-quality TODO comments and provide insights into their optimal composition. We discuss the lifecycle of TODO comments with varying quality. To assist developers, we construct deep learning-based methods that show promising performance in identifying the quality of TODO comments, potentially enhancing development efficiency and code quality.

软件开发是一个协作过程,涉及个人和团队之间的各种互动。在这一过程中,源代码中的 TODO 注释在管理和协调各种任务方面发挥着至关重要的作用。然而,本研究发现,很大一部分开源项目的 TODO 注释都没有得到解决或需要很长时间才能解决。开源资源库中约有 46.7% 的 TODO 注释质量不高(例如,TODO 含糊不清、缺乏信息或对开发人员毫无用处)。这凸显了改善 TODO 实践的必要性。在本研究中,我们从四个方面调查了开源项目中 TODO 注释的质量:(1) 低质量 TODO 注释的普遍性;(2) 高质量 TODO 注释的主要特征;(3) 不同质量的 TODO 注释在实践中是如何管理的;(4) 自动评估 TODO 注释质量的可行性。通过研究 GitHub Java 库 Top100 中的 2,863 条 TODO 注释,我们提出了识别高质量 TODO 注释的标准,并对其最佳构成提出了见解。我们讨论了不同质量的 TODO 注释的生命周期。为了帮助开发人员,我们构建了基于深度学习的方法,这些方法在识别 TODO 注释质量方面表现出良好的性能,有望提高开发效率和代码质量。
{"title":"What Makes a Good TODO Comment?","authors":"Haoye Wang, Zhipeng Gao, Tingting Bi, John Grundy, Xinyu Wang, Minghui Wu, Xiaohu Yang","doi":"10.1145/3664811","DOIUrl":"https://doi.org/10.1145/3664811","url":null,"abstract":"<p>Software development is a collaborative process that involves various interactions among individuals and teams. TODO comments in source code play a critical role in managing and coordinating diverse tasks during this process. However, this study finds that a large proportion of open-source project TODO comments are left unresolved or take a long time to be resolved. About 46.7% of TODO comments in open-source repositories are of low-quality (e.g., TODOs that are ambiguous, lack information, or are useless to developers). This highlights the need for better TODO practices. In this study, we investigate four aspects regarding the quality of TODO comments in open-source projects: (1) the prevalence of low-quality TODO comments; (2) the key characteristics of high-quality TODO comments; (3) how are TODO comments of different quality managed in practice; and (4) the feasibility of automatically assessing TODO comment quality. Examining 2,863 TODO comments from Top100 GitHub Java repositories, we propose criteria to identify high-quality TODO comments and provide insights into their optimal composition. We discuss the lifecycle of TODO comments with varying quality. To assist developers, we construct deep learning-based methods that show promising performance in identifying the quality of TODO comments, potentially enhancing development efficiency and code quality.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"11 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Meta-Learning for Multi-Family Android Malware Classification 用于多家族安卓恶意软件分类的元学习
IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-13 DOI: 10.1145/3664806
Yao Li, Dawei Yuan, Tao Zhang, Haipeng Cai, David Lo, Cuiyun Gao, Xiapu Luo, He Jiang

With the emergence of smartphones, Android has become a widely used mobile operating system. However, it is vulnerable when encountering various types of attacks. Every day, new malware threatens the security of users’ devices and private data. Many methods have been proposed to classify malicious applications, utilizing static or dynamic analysis for classification. However, previous methods still suffer from unsatisfactory performance due to two challenges. First, they are unable to address the imbalanced data distribution problem, leading to poor performance for malware families with few members. Second, they are unable to address the zero-day malware (zero-day malware refers to malicious applications that exploit unknown vulnerabilities) classification problem. In this paper, we introduce an innovative meta-learning approach for multi-family Android malware classification named Meta-MAMC, which uses meta-learning technology to learn meta-knowledge (i.e. the similarities and differences among different malware families) of few-family samples and combines new sampling algorithms to solve the above challenges. Meta-MAMC integrates (i) the meta-knowledge contained within the dataset to guide models in learning to identify unknown malware, and (ii) more accurate and diverse tasks based on novel sampling strategies, as well as directly adapting meta-learning to a new few-sample and zero-sample task to classify families. We have evaluated Meta-MAMC on two popular datasets and a corpus of real-world Android applications. The results demonstrate its efficacy in accurately classifying malicious applications belonging to certain malware families, even achieving 100% classification in some families.

随着智能手机的出现,安卓已成为一种广泛使用的移动操作系统。然而,它在遭遇各种类型的攻击时也很脆弱。每天都有新的恶意软件威胁着用户设备和私人数据的安全。人们提出了许多方法来对恶意应用程序进行分类,利用静态或动态分析进行分类。然而,以往的方法由于面临两个挑战,性能仍不尽如人意。首先,这些方法无法解决数据分布不平衡的问题,导致对于成员较少的恶意软件家族来说性能不佳。其次,它们无法解决零日恶意软件(零日恶意软件指利用未知漏洞的恶意应用程序)分类问题。本文介绍了一种用于多家族安卓恶意软件分类的创新元学习方法--Meta-MAMC,该方法利用元学习技术学习少家族样本的元知识(即不同恶意软件家族之间的异同),并结合新的采样算法来解决上述难题。Meta-MAMC 整合了:(1)数据集中包含的元知识,以指导模型学习识别未知恶意软件;(2)基于新型采样策略的更准确、更多样化的任务,以及直接将元学习适应于新的少样本和零样本任务,以对家族进行分类。我们在两个流行数据集和一个真实安卓应用语料库上对 Meta-MAMC 进行了评估。结果表明,Meta-MAMC 能够准确地对属于某些恶意软件家族的恶意应用程序进行分类,在某些家族中甚至达到了 100% 的分类率。
{"title":"Meta-Learning for Multi-Family Android Malware Classification","authors":"Yao Li, Dawei Yuan, Tao Zhang, Haipeng Cai, David Lo, Cuiyun Gao, Xiapu Luo, He Jiang","doi":"10.1145/3664806","DOIUrl":"https://doi.org/10.1145/3664806","url":null,"abstract":"<p>With the emergence of smartphones, Android has become a widely used mobile operating system. However, it is vulnerable when encountering various types of attacks. Every day, new malware threatens the security of users’ devices and private data. Many methods have been proposed to classify malicious applications, utilizing static or dynamic analysis for classification. However, previous methods still suffer from unsatisfactory performance due to two challenges. First, they are unable to address the imbalanced data distribution problem, leading to poor performance for malware families with few members. Second, they are unable to address the zero-day malware (zero-day malware refers to malicious applications that exploit unknown vulnerabilities) classification problem. In this paper, we introduce an innovative <b>meta</b>-learning approach for <b>m</b>ulti-family <b>A</b>ndroid <b>m</b>alware <b>c</b>lassification named <b>Meta-MAMC</b>, which uses meta-learning technology to learn meta-knowledge (i.e. the similarities and differences among different malware families) of few-family samples and combines new sampling algorithms to solve the above challenges. <monospace>Meta-MAMC</monospace> integrates (i) the meta-knowledge contained within the dataset to guide models in learning to identify unknown malware, and (ii) more accurate and diverse tasks based on novel sampling strategies, as well as directly adapting meta-learning to a new few-sample and zero-sample task to classify families. We have evaluated <monospace>Meta-MAMC</monospace> on two popular datasets and a corpus of real-world Android applications. The results demonstrate its efficacy in accurately classifying malicious applications belonging to certain malware families, even achieving 100% classification in some families.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"44 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Model Update Strategies for Supervised Learning in AIOps Solutions 论 AIOps 解决方案中监督学习的模型更新策略
IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-13 DOI: 10.1145/3664599
Yingzhe Lyu, Heng Li, Zhen Ming (Jack) Jiang, Ahmed Hassan

AIOps (Artificial Intelligence for IT Operations) solutions leverage the massive data produced during the operation of large-scale systems and machine learning models to assist software engineers in their system operations. As operation data produced in the field are constantly evolving due to factors such as the changing operational environment and user base, the models in AIOps solutions need to be constantly maintained after deployment. While prior works focus on innovative modeling techniques to improve the performance of AIOps models before releasing them into the field, when and how to update AIOps models remain an under-investigated topic. In this work, we performed a case study on three large-scale public operation data: two trace datasets from the cloud computing platforms of Google and Alibaba and one disk stats dataset from the BackBlaze cloud storage data center. We empirically assessed five different types of model update strategies for supervised learning regarding their performance, updating cost, and stability. We observed that active model update strategies (e.g., periodical retraining, concept drift guided retraining, time-based model ensembles, and online learning) achieve better and more stable performance than a stationary model. Particularly, applying sophisticated model update strategies (e.g., concept drift detection, time-based ensembles, and online learning) could provide better performance, efficiency, and stability than simply retraining AIOps models periodically. In addition, we observed that, although some update strategies (e.g., time-based ensemble and online learning) can save model training time, they significantly sacrifice model testing time, which could hinder their applications in AIOps solutions where the operation data arrive at high pace and volume and where immediate inferences are required. Our findings highlight that practitioners should consider the evolution of operation data and actively maintain AIOps models over time. Our observations can also guide researchers and practitioners in investigating more efficient and effective model update strategies that fit in the context of AIOps.

AIOps (IT 运营人工智能)解决方案利用大型系统运行过程中产生的海量数据和机器学习模型来协助软件工程师进行系统运营。由于现场产生的操作数据会因操作环境和用户群的变化等因素而不断变化,因此 AIOps 解决方案中的模型在部署后需要不断维护。之前的工作主要集中在创新建模技术上,以便在将 AIOps 模型发布到现场之前提高其性能,而何时以及如何更新 AIOps 模型仍是一个未得到充分研究的课题。在这项工作中,我们对三个大规模公共运行数据进行了案例研究:两个来自谷歌和阿里巴巴云计算平台的跟踪数据集和一个来自 BackBlaze 云存储数据中心的磁盘统计数据集。我们对五种不同类型的监督学习模型更新策略的性能、更新成本和稳定性进行了实证评估。我们发现,主动模型更新策略(如周期性再训练、概念漂移引导再训练、基于时间的模型集合和在线学习)比静态模型的性能更好、更稳定。特别是,应用复杂的模型更新策略(如概念漂移检测、基于时间的模型集合和在线学习)可以提供比简单地定期重新训练 AIOps 模型更好的性能、效率和稳定性。此外,我们还注意到,虽然某些更新策略(如基于时间的集合和在线学习)可以节省模型训练时间,但它们却大大牺牲了模型测试时间,这可能会妨碍它们在操作数据到达速度快、数量大且需要立即推断的 AIOps 解决方案中的应用。我们的研究结果突出表明,从业人员应考虑运营数据的演变,并随着时间的推移积极维护 AIOps 模型。我们的观察结果还能指导研究人员和从业人员研究更高效、更有效的模型更新策略,以适应 AIOps 的环境。
{"title":"On the Model Update Strategies for Supervised Learning in AIOps Solutions","authors":"Yingzhe Lyu, Heng Li, Zhen Ming (Jack) Jiang, Ahmed Hassan","doi":"10.1145/3664599","DOIUrl":"https://doi.org/10.1145/3664599","url":null,"abstract":"<p>AIOps (Artificial Intelligence for IT Operations) solutions leverage the massive data produced during the operation of large-scale systems and machine learning models to assist software engineers in their system operations. As operation data produced in the field are constantly evolving due to factors such as the changing operational environment and user base, the models in AIOps solutions need to be constantly maintained after deployment. While prior works focus on innovative modeling techniques to improve the performance of AIOps models before releasing them into the field, when and how to update AIOps models remain an under-investigated topic. In this work, we performed a case study on three large-scale public operation data: two trace datasets from the cloud computing platforms of Google and Alibaba and one disk stats dataset from the BackBlaze cloud storage data center. We empirically assessed five different types of model update strategies for supervised learning regarding their performance, updating cost, and stability. We observed that active model update strategies (e.g., periodical retraining, concept drift guided retraining, time-based model ensembles, and online learning) achieve better and more stable performance than a stationary model. Particularly, applying sophisticated model update strategies (e.g., concept drift detection, time-based ensembles, and online learning) could provide better performance, efficiency, and stability than simply retraining AIOps models periodically. In addition, we observed that, although some update strategies (e.g., time-based ensemble and online learning) can save model training time, they significantly sacrifice model testing time, which could hinder their applications in AIOps solutions where the operation data arrive at high pace and volume and where immediate inferences are required. Our findings highlight that practitioners should consider the evolution of operation data and actively maintain AIOps models over time. Our observations can also guide researchers and practitioners in investigating more efficient and effective model update strategies that fit in the context of AIOps.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"40 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing GUI Exploration Coverage of Android Apps with Deep Link-Integrated Monkey 利用深度链接集成猴增强 Android 应用程序的图形用户界面探索覆盖率
IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-13 DOI: 10.1145/3664810
Han Hu, Han Wang, Ruiqi Dong, Xiao Chen, Chunyang Chen

Mobile apps are ubiquitous in our daily lives for supporting different tasks such as reading and chatting. Despite the availability of many GUI testing tools, app testers still struggle with low testing code coverage due to tools frequently getting stuck in loops or overlooking activities with concealed entries. This results in a significant amount of testing time being spent on redundant and repetitive exploration of a few GUI pages. To address this, we utilize Android’s deep links, which assist in triggering Android intents to lead users to specific pages and introduce a deep link-enhanced exploration method. This approach, integrated into the testing tool Monkey, gives rise to Delm (Deep Link-enhanced Monkey). Delm oversees the dynamic exploration process, guiding the tool out of meaningless testing loops to unexplored GUI pages. We provide a rigorous activity context mock-up approach for triggering existing Android intents to discover more activities with hidden entrances. We conduct experiments to evaluate Delm’s effectiveness on activity context mock-up, activity coverage, method coverage, and crash detection. The findings reveal that Delm can mock up more complex activity contexts and significantly outperform state-of-the-art baselines with 27.2% activity coverage, 21.13% method coverage, and 23.81% crash detection.

移动应用程序在我们的日常生活中无处不在,可支持阅读和聊天等不同任务。尽管有许多图形用户界面测试工具,但由于工具经常陷入循环或忽略隐藏条目的活动,应用程序测试人员仍在为测试代码覆盖率低而苦恼。这就导致大量的测试时间花费在对几个图形用户界面页面的冗余和重复探索上。为了解决这个问题,我们利用了安卓的深度链接,它有助于触发安卓意图,引导用户进入特定页面,并引入了一种深度链接增强探索方法。这种方法集成到测试工具 Monkey 中,形成了 Delm(深度链接增强型 Monkey)。Delm 监督动态探索过程,引导工具脱离无意义的测试循环,进入未探索的图形用户界面页面。我们提供了一种严格的活动上下文模拟方法,用于触发现有的安卓意图,以发现更多具有隐藏入口的活动。我们通过实验来评估 Delm 在活动上下文模拟、活动覆盖率、方法覆盖率和崩溃检测方面的有效性。实验结果表明,Delm 可以模拟更复杂的活动上下文,并以 27.2% 的活动覆盖率、21.13% 的方法覆盖率和 23.81% 的碰撞检测率明显优于最先进的基线。
{"title":"Enhancing GUI Exploration Coverage of Android Apps with Deep Link-Integrated Monkey","authors":"Han Hu, Han Wang, Ruiqi Dong, Xiao Chen, Chunyang Chen","doi":"10.1145/3664810","DOIUrl":"https://doi.org/10.1145/3664810","url":null,"abstract":"<p>Mobile apps are ubiquitous in our daily lives for supporting different tasks such as reading and chatting. Despite the availability of many GUI testing tools, app testers still struggle with low testing code coverage due to tools frequently getting stuck in loops or overlooking activities with concealed entries. This results in a significant amount of testing time being spent on redundant and repetitive exploration of a few GUI pages. To address this, we utilize Android’s deep links, which assist in triggering Android intents to lead users to specific pages and introduce a deep link-enhanced exploration method. This approach, integrated into the testing tool Monkey, gives rise to Delm (Deep Link-enhanced Monkey). Delm oversees the dynamic exploration process, guiding the tool out of meaningless testing loops to unexplored GUI pages. We provide a rigorous activity context mock-up approach for triggering existing Android intents to discover more activities with hidden entrances. We conduct experiments to evaluate Delm’s effectiveness on activity context mock-up, activity coverage, method coverage, and crash detection. The findings reveal that Delm can mock up more complex activity contexts and significantly outperform state-of-the-art baselines with 27.2% activity coverage, 21.13% method coverage, and 23.81% crash detection.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"30 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Focused Test Generation for Autonomous Driving Systems 自动驾驶系统的集中测试生成
IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-13 DOI: 10.1145/3664605
Tahereh Zohdinasab, Vincenzo Riccio, Paolo Tonella

Testing Autonomous Driving Systems (ADSs) is crucial to ensure their reliability when navigating complex environments. ADSs may exhibit unexpected behaviours when presented, during operation, with driving scenarios containing features inadequately represented in the training dataset. To address this shift from development to operation, developers must acquire new data with the newly observed features. This data can be then utilised to fine tune the ADS, so as to reach the desired level of reliability in performing driving tasks. However, the resource-intensive nature of testing ADSs requires efficient methodologies for generating targeted and diverse tests.

In this work, we introduce a novel approach, DeepAtash-LR, that incorporates a surrogate model into the focused test generation process. This integration significantly improves focused testing effectiveness and applicability in resource-intensive scenarios. Experimental results show that the integration of the surrogate model is fundamental to the success of DeepAtash-LR. Our approach was able to generate an average of up to 60 × more targeted, failure-inducing inputs compared to the baseline approach. Moreover, the inputs generated by DeepAtash-LR were useful to significantly improve the quality of the original ADS through fine tuning.

测试自动驾驶系统(ADS)对于确保其在复杂环境中行驶时的可靠性至关重要。当自动驾驶系统在运行过程中遇到包含训练数据集中未充分体现的特征的驾驶场景时,可能会表现出意想不到的行为。为了解决这种从开发到运行的转变,开发人员必须利用新观察到的特征获取新数据。然后,可以利用这些数据对自动驾驶辅助系统进行微调,以便在执行驾驶任务时达到所需的可靠性水平。然而,自动驾驶辅助系统的测试需要大量资源,因此需要高效的方法来生成有针对性的多样化测试。在这项工作中,我们引入了一种新方法--DeepAtash-LR,它将代理模型纳入了重点测试生成流程。这种集成大大提高了集中测试的有效性和在资源密集型场景中的适用性。实验结果表明,代用模型的集成是 DeepAtash-LR 取得成功的基础。与基线方法相比,我们的方法平均能够生成多达 60 倍的有针对性的故障诱导输入。此外,DeepAtash-LR 生成的输入有助于通过微调显著提高原始 ADS 的质量。
{"title":"Focused Test Generation for Autonomous Driving Systems","authors":"Tahereh Zohdinasab, Vincenzo Riccio, Paolo Tonella","doi":"10.1145/3664605","DOIUrl":"https://doi.org/10.1145/3664605","url":null,"abstract":"<p>Testing Autonomous Driving Systems (ADSs) is crucial to ensure their reliability when navigating complex environments. ADSs may exhibit unexpected behaviours when presented, during operation, with driving scenarios containing features inadequately represented in the training dataset. To address this shift from development to operation, developers must acquire new data with the newly observed features. This data can be then utilised to fine tune the ADS, so as to reach the desired level of reliability in performing driving tasks. However, the resource-intensive nature of testing ADSs requires efficient methodologies for generating targeted and diverse tests. </p><p>In this work, we introduce a novel approach, <span>DeepAtash-LR</span>, that incorporates a surrogate model into the focused test generation process. This integration significantly improves focused testing effectiveness and applicability in resource-intensive scenarios. Experimental results show that the integration of the surrogate model is fundamental to the success of <span>DeepAtash-LR</span>. Our approach was able to generate an average of up to 60 × more targeted, failure-inducing inputs compared to the baseline approach. Moreover, the inputs generated by <span>DeepAtash-LR</span> were useful to significantly improve the quality of the original ADS through fine tuning.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"40 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM Transactions on Software Engineering and Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1