首页 > 最新文献

Automated Software Engineering最新文献

英文 中文
Measuring the impact of predictive models on the software project: A cost, service time, and risk evaluation of a metric-based defect severity prediction model 测量预测模型对软件项目的影响:基于度量的缺陷严重性预测模型的成本、服务时间和风险评估
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-19 DOI: 10.1007/s10515-025-00519-3
Umamaheswara Sharma B, Ravichandra Sadam

In a critical software system, the testers have to spend an enormous amount of time and effort maintaining the software due to the continuous occurrence of defects. To reduce the time and effort of a tester, prior works in the literature are limited to using documented defect reports to automatically predict the severity of the defective software modules. In contrast, in this work, we propose a metric-based software defect severity prediction (SDSP) model that is built using a decision-tree incorporated self-training semi-supervised learning approach to classify the severity of the defective software modules. Empirical analysis of the proposed model on the AEEEM datasets suggests using the proposed approach as it successfully assigns suitable severity class labels to the unlabelled modules. On the other hand, numerous research studies have addressed the methodological aspects of SDSP models, but the gap in estimating the performance of a developed prediction using suitable measures remains unattempt. For this, we propose the risk factor, per cent of the saved budget, loss in the saved budget, per cent of remaining edits, per cent of remaining edits, remaining service time, and gratuitous service time, to interpret the predictions in terms of project objectives. Empirical analysis of the proposed approach shows the benefit of using the proposed measures in addition to the traditional measures.

在一个关键的软件系统中,由于缺陷的不断出现,测试人员不得不花费大量的时间和精力来维护软件。为了减少测试人员的时间和精力,文献中的先前工作仅限于使用文档化的缺陷报告来自动预测有缺陷的软件模块的严重程度。相比之下,在这项工作中,我们提出了一个基于度量的软件缺陷严重性预测(SDSP)模型,该模型使用决策树结合自训练半监督学习方法来对缺陷软件模块的严重程度进行分类。对AEEEM数据集上提出的模型的实证分析建议使用所提出的方法,因为它成功地为未标记的模块分配了合适的严重等级标签。另一方面,许多研究已经解决了SDSP模型的方法学方面的问题,但在使用合适的措施估计发达预测的性能方面仍然存在差距。为此,我们提出了风险因素,节省预算的百分比,节省预算中的损失,剩余编辑的百分比,剩余编辑的百分比,剩余服务时间和无偿服务时间,以项目目标来解释预测。对该方法的实证分析表明,除了使用传统的度量方法外,还可以使用所提出的度量方法。
{"title":"Measuring the impact of predictive models on the software project: A cost, service time, and risk evaluation of a metric-based defect severity prediction model","authors":"Umamaheswara Sharma B,&nbsp;Ravichandra Sadam","doi":"10.1007/s10515-025-00519-3","DOIUrl":"10.1007/s10515-025-00519-3","url":null,"abstract":"<div><p>In a critical software system, the testers have to spend an enormous amount of time and effort maintaining the software due to the continuous occurrence of defects. To reduce the time and effort of a tester, prior works in the literature are limited to using documented defect reports to automatically predict the severity of the defective software modules. In contrast, in this work, we propose a metric-based software defect severity prediction (SDSP) model that is built using a decision-tree incorporated self-training semi-supervised learning approach to classify the severity of the defective software modules. Empirical analysis of the proposed model on the AEEEM datasets suggests using the proposed approach as it successfully assigns suitable severity class labels to the unlabelled modules. On the other hand, numerous research studies have addressed the methodological aspects of SDSP models, but the gap in estimating the performance of a developed prediction using suitable measures remains unattempt. For this, we propose the risk factor, per cent of the saved budget, loss in the saved budget, per cent of remaining edits, per cent of remaining edits, remaining service time, and gratuitous service time, to interpret the predictions in terms of project objectives. Empirical analysis of the proposed approach shows the benefit of using the proposed measures in addition to the traditional measures.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144084984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The impact of feature selection and feature reduction techniques for code smell detection: A comprehensive empirical study 特征选择和特征约简技术对代码气味检测的影响:一项全面的实证研究
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-16 DOI: 10.1007/s10515-025-00524-6
Zexian Zhang, Lin Zhu, Shuang Yin, Wenhua Hu, Shan Gao, Haoxuan Chen, Fuyang Li

Code smell detection using machine/deep learning methods aims to classify code instances as smelly or non-smelly based on extracted features. Accurate detection relies on optimizing feature sets by focusing on relevant features while discarding those that are redundant or irrelevant. However, prior studies on feature selection and reduction techniques for code smell detection have yielded inconsistent results, possibly due to limited exploration of available techniques. To address this gap, we comprehensively analyze 33 feature selection and 6 feature reduction techniques across seven classification models and six code smell datasets. And we apply the Scott-Knott effect size difference test for comparing performance and McNemar’s test for assessing prediction diversity. The results show that (1) Not all feature selection and reduction techniques significantly improve detection performance. (2) Feature extraction techniques generally perform worse than feature selection techniques. (3) Probabilistic significance is recommended as a “generic” feature selection technique due to its higher consistency in identifying smelly instances. (4) High-frequency features selected by the top feature selection techniques vary by dataset, highlighting their specific relevance for identifying the corresponding code smells. Based on these findings, we provide implications for further code smell detection research.

使用机器/深度学习方法的代码气味检测旨在根据提取的特征将代码实例分类为有气味或无气味。准确的检测依赖于通过关注相关特征而丢弃冗余或不相关的特征来优化特征集。然而,之前关于代码气味检测的特征选择和约简技术的研究得出了不一致的结果,这可能是由于对可用技术的探索有限。为了解决这一差距,我们综合分析了7种分类模型和6种代码气味数据集的33种特征选择和6种特征约简技术。我们采用Scott-Knott效应大小差异检验来比较绩效,采用McNemar检验来评估预测多样性。结果表明:(1)并非所有的特征选择和约简技术都能显著提高检测性能。(2)特征提取技术通常不如特征选择技术。(3)概率显著性被推荐为一种“通用”特征选择技术,因为它在识别臭实例方面具有更高的一致性。(4)由顶级特征选择技术选择的高频特征因数据集而异,突出了它们与识别相应代码气味的特定相关性。基于这些发现,我们为进一步的代码气味检测研究提供了启示。
{"title":"The impact of feature selection and feature reduction techniques for code smell detection: A comprehensive empirical study","authors":"Zexian Zhang,&nbsp;Lin Zhu,&nbsp;Shuang Yin,&nbsp;Wenhua Hu,&nbsp;Shan Gao,&nbsp;Haoxuan Chen,&nbsp;Fuyang Li","doi":"10.1007/s10515-025-00524-6","DOIUrl":"10.1007/s10515-025-00524-6","url":null,"abstract":"<div><p>Code smell detection using machine/deep learning methods aims to classify code instances as smelly or non-smelly based on extracted features. Accurate detection relies on optimizing feature sets by focusing on relevant features while discarding those that are redundant or irrelevant. However, prior studies on feature selection and reduction techniques for code smell detection have yielded inconsistent results, possibly due to limited exploration of available techniques. To address this gap, we comprehensively analyze 33 feature selection and 6 feature reduction techniques across seven classification models and six code smell datasets. And we apply the Scott-Knott effect size difference test for comparing performance and McNemar’s test for assessing prediction diversity. The results show that (1) Not all feature selection and reduction techniques significantly improve detection performance. (2) Feature extraction techniques generally perform worse than feature selection techniques. (3) Probabilistic significance is recommended as a “generic” feature selection technique due to its higher consistency in identifying smelly instances. (4) High-frequency features selected by the top feature selection techniques vary by dataset, highlighting their specific relevance for identifying the corresponding code smells. Based on these findings, we provide implications for further code smell detection research.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144073611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural contrastive learning based automatic bug triaging 基于结构对比学习的自动错误分类
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-16 DOI: 10.1007/s10515-025-00517-5
Yi Tao, Jie Dai, Lingna Ma, Zhenhui Ren, Fei Wang

Bug triaging is crucial for software maintenance, as it matches developers with bug reports they are most qualified to handle. This task has gained importance with the growth of the open-source community. Traditionally, methods have emphasized semantic classification of bug reports, but recent approaches focus on the associations between bugs and developers. Leveraging latent patterns from bug-fixing records can enhance triaging predictions; however, the limited availability of these records presents a significant challenge. This scarcity highlights a broader issue in supervised learning: the inadequacy of labeled data and the underutilization of unlabeled data. To address these limitations, we propose a novel framework named SCL-BT (Structural Contrastive Learning-based Bug Triaging). This framework improves the utilization of labeled heterogeneous associations through edge perturbation and leverages unlabeled homogeneous associations via hypergraph sampling. These processes are integrated with a graph convolutional network backbone to enhance the prediction of associations and, consequently, bug triaging accuracy. Experimental results demonstrate that SCL-BT significantly outperforms existing models on public datasets. Specifically, on the Google Chromium dataset, SCL-BT surpasses the GRCNN method by 18.64(%) in terms of the Top-9 Hit Ratio metric. The innovative approach of SCL-BT offers valuable insights for the research of automatic bug-triaging.

错误分类对软件维护至关重要,因为它为开发人员匹配他们最有资格处理的错误报告。随着开源社区的发展,这项任务变得越来越重要。传统上,方法强调错误报告的语义分类,但最近的方法侧重于错误和开发人员之间的关联。利用bug修复记录中的潜在模式可以增强分类预测;然而,这些记录的有限可用性提出了一个重大挑战。这种稀缺性突出了监督学习中一个更广泛的问题:标记数据的不足和未标记数据的利用不足。为了解决这些限制,我们提出了一个名为SCL-BT(基于结构对比学习的Bug Triaging)的新框架。该框架通过边缘扰动提高了标记异质关联的利用率,并通过超图采样利用未标记的同质关联。这些过程与图卷积网络主干集成,以增强关联的预测,从而提高错误分类的准确性。实验结果表明,SCL-BT在公共数据集上显著优于现有模型。具体来说,在谷歌Chromium数据集上,就Top-9命中率指标而言,SCL-BT比GRCNN方法高出18.64 (%)。SCL-BT的创新方法为自动错误分类的研究提供了有价值的见解。
{"title":"Structural contrastive learning based automatic bug triaging","authors":"Yi Tao,&nbsp;Jie Dai,&nbsp;Lingna Ma,&nbsp;Zhenhui Ren,&nbsp;Fei Wang","doi":"10.1007/s10515-025-00517-5","DOIUrl":"10.1007/s10515-025-00517-5","url":null,"abstract":"<div><p>Bug triaging is crucial for software maintenance, as it matches developers with bug reports they are most qualified to handle. This task has gained importance with the growth of the open-source community. Traditionally, methods have emphasized semantic classification of bug reports, but recent approaches focus on the associations between bugs and developers. Leveraging latent patterns from bug-fixing records can enhance triaging predictions; however, the limited availability of these records presents a significant challenge. This scarcity highlights a broader issue in supervised learning: the inadequacy of labeled data and the underutilization of unlabeled data. To address these limitations, we propose a novel framework named SCL-BT (Structural Contrastive Learning-based Bug Triaging). This framework improves the utilization of labeled heterogeneous associations through edge perturbation and leverages unlabeled homogeneous associations via hypergraph sampling. These processes are integrated with a graph convolutional network backbone to enhance the prediction of associations and, consequently, bug triaging accuracy. Experimental results demonstrate that SCL-BT significantly outperforms existing models on public datasets. Specifically, on the Google Chromium dataset, SCL-BT surpasses the GRCNN method by 18.64<span>(%)</span> in terms of the Top-9 Hit Ratio metric. The innovative approach of SCL-BT offers valuable insights for the research of automatic bug-triaging.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144073610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An empirical study of test case prioritization on the Linux Kernel Linux内核测试用例优先级的实证研究
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-13 DOI: 10.1007/s10515-025-00522-8
Haichi Wang, Ruiguo Yu, Dong Wang, Yiheng Du, Yingquan Zhao, Junjie Chen, Zan Wang

The Linux kernel is a complex and constantly evolving system, where each code change can impact different components of the system. Regression testing ensures that new changes do not affect existing functionality or introduce new defects. However, due to the complexity of the Linux kernel, maintenance remains challenging. While practices like Continuous Integration (CI) facilitate rapid commits through automated regression testing, each CI process still incurs substantial costs due to the extensive number of test cases. Traditional software testing employs test case prioritization (TCP) techniques to prioritize test cases, thus enabling the early detection of defects. Due to the unique characteristics of the Linux kernel, it remains unclear whether the existing TCP techniques are suitable for its regression testing. In this paper, we present the first empirical study by comparing various TCP techniques in Linux kernel context. Specifically, we examined a total of 17 TCP techniques, including similarity-based, information-retrieval-based, and coverage-based techniques. The experimental results demonstrate that: (1) Similarity-based TCP techniques perform best on the Linux kernel, achieving a mean APFD (Average Percentage of Faults Detected) value of 0.7583 and requiring significantly less time; (2) The majority of TCP techniques show relatively stable performance across multiple commits, where similarity-based TCP techniques are more stable with a maximum decrease of 3.03% and 3.92% in terms of mean and median APFD values, respectively; (3) More than half of the studied techniques are significantly affected by flaky tests, with both mean and median APFD values ranging from -29.9% to -63.5%. This work takes the first look at the adoption of TCP techniques in the Linux kernel, confirming its potential for effective and efficient prioritization.

Linux内核是一个复杂且不断发展的系统,其中每个代码更改都会影响系统的不同组件。回归测试确保新的更改不会影响现有的功能或引入新的缺陷。然而,由于Linux内核的复杂性,维护仍然具有挑战性。虽然像持续集成(CI)这样的实践通过自动化的回归测试促进了快速提交,但是由于测试用例的大量存在,每个CI过程仍然会产生大量的成本。传统的软件测试使用测试用例优先级(TCP)技术来确定测试用例的优先级,从而能够早期发现缺陷。由于Linux内核的独特特性,目前尚不清楚现有的TCP技术是否适合其回归测试。在本文中,我们通过比较Linux内核环境中的各种TCP技术,提出了第一个实证研究。具体地说,我们研究了总共17种TCP技术,包括基于相似性的、基于信息检索的和基于覆盖的技术。实验结果表明:(1)基于相似度的TCP技术在Linux内核上表现最好,平均APFD(平均故障检测百分比)值为0.7583,所需时间显著减少;(2)大多数TCP技术在多个提交中表现出相对稳定的性能,其中基于相似性的TCP技术更稳定,APFD均值和中位数分别最大下降3.03%和3.92%;(3)半数以上的研究技术受到片状试验的显著影响,APFD的平均值和中位数在-29.9% ~ -63.5%之间。本文首先介绍了在Linux内核中采用TCP技术,确认了其有效和高效优先级的潜力。
{"title":"An empirical study of test case prioritization on the Linux Kernel","authors":"Haichi Wang,&nbsp;Ruiguo Yu,&nbsp;Dong Wang,&nbsp;Yiheng Du,&nbsp;Yingquan Zhao,&nbsp;Junjie Chen,&nbsp;Zan Wang","doi":"10.1007/s10515-025-00522-8","DOIUrl":"10.1007/s10515-025-00522-8","url":null,"abstract":"<div><p>The Linux kernel is a complex and constantly evolving system, where each code change can impact different components of the system. Regression testing ensures that new changes do not affect existing functionality or introduce new defects. However, due to the complexity of the Linux kernel, maintenance remains challenging. While practices like Continuous Integration (CI) facilitate rapid commits through automated regression testing, each CI process still incurs substantial costs due to the extensive number of test cases. Traditional software testing employs test case prioritization (TCP) techniques to prioritize test cases, thus enabling the early detection of defects. Due to the unique characteristics of the Linux kernel, it remains unclear whether the existing TCP techniques are suitable for its regression testing. In this paper, we present the first empirical study by comparing various TCP techniques in Linux kernel context. Specifically, we examined a total of 17 TCP techniques, including similarity-based, information-retrieval-based, and coverage-based techniques. The experimental results demonstrate that: (1) Similarity-based TCP techniques perform best on the Linux kernel, achieving a mean APFD (Average Percentage of Faults Detected) value of 0.7583 and requiring significantly less time; (2) The majority of TCP techniques show relatively stable performance across multiple commits, where similarity-based TCP techniques are more stable with a maximum decrease of 3.03% and 3.92% in terms of mean and median APFD values, respectively; (3) More than half of the studied techniques are significantly affected by flaky tests, with both mean and median APFD values ranging from -29.9% to -63.5%. This work takes the first look at the adoption of TCP techniques in the Linux kernel, confirming its potential for effective and efficient prioritization.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143938562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
iALBMAD: an improved agile-based layered approach for mobile app development iALBMAD:一种改进的基于敏捷的移动应用开发分层方法
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-10 DOI: 10.1007/s10515-025-00520-w
Anil Patidar, Ugrasen Suman

The demand to acquire improved efficiency, agility, and adaptability led to rapid evolution in mobile app development (MAD). Agile approaches are recognized for being cooperative and iterative, but there are still issues in handling a range of MAD necessities. The objective here is to blend the best practices of several prominent agile approaches and non-agile approaches to form an innovative and improved MAD approach, which we refer to as the improved Agile and Lean-based MAD Approach (iALBMAD), and this approach was the improved upon our previous work, ALBMAD. Here, three aspects of improvement concerning the discovery of suitable app attributes and best practices at various MAD activities and strengthening requirement gathering activities are exploited. For this to be accomplished, first we determined different app attributes that affect the MAD, agile and non-agile best practices, and machine learning (ML) functioning in MAD from the accessible literature. Now, we have equipped ALBMAD with all these gained aspects as per their applicability and offered it to 18 MAD experts to obtain suggestions for its improvement. Considering the experts’ opinions, a three-layered approach, namely, iALBMAD, was developed. In iALBMAD, automation and an iterative cycle are established to meet finished needs; these revisions may boost the quality of requirements and minimize time. Specific and experts validated best practices and app attributes suitable for each activity of iALBMAD are offered, which will assist less-skilled developers. Thirteen users verified the usability of six teams’ apps created using three different approaches, and the results show that the iALBMAD performs better than other approaches. The suggested approach and the discoveries will provide insightful information for individuals and MAD firms aiming to improve the way of MAD.

对提高效率、敏捷性和适应性的需求导致了移动应用程序开发(MAD)的快速发展。敏捷方法被认为是协作和迭代的,但是在处理一系列MAD需求方面仍然存在问题。这里的目标是将几种突出的敏捷方法和非敏捷方法的最佳实践结合起来,形成一种创新的改进的MAD方法,我们将其称为改进的基于敏捷和精益的MAD方法(iALBMAD),该方法是在我们之前的工作(ALBMAD)的基础上改进的。本文从三个方面进行了改进,即在各种MAD活动中发现合适的应用程序属性和最佳实践,以及加强需求收集活动。为了实现这一点,首先我们从可访问的文献中确定影响MAD的不同应用程序属性,敏捷和非敏捷最佳实践,以及MAD中的机器学习(ML)功能。现在,我们已经根据这些方面的适用性,将所有这些方面都装备在了ALBMAD中,并将其提供给了18位MAD专家,以获得改进建议。考虑到专家的意见,我们制定了一个三层的方法,即iALBMAD。在iALBMAD中,建立了自动化和迭代周期以满足最终需求;这些修订可能会提高需求的质量并减少时间。具体和专家验证的最佳实践和应用程序属性适合iALBMAD的每一个活动提供,这将有助于低技能的开发人员。13名用户验证了6个团队使用三种不同方法创建的应用程序的可用性,结果表明iALBMAD比其他方法表现得更好。建议的方法和发现将为旨在改进MAD方式的个人和MAD公司提供有见地的信息。
{"title":"iALBMAD: an improved agile-based layered approach for mobile app development","authors":"Anil Patidar,&nbsp;Ugrasen Suman","doi":"10.1007/s10515-025-00520-w","DOIUrl":"10.1007/s10515-025-00520-w","url":null,"abstract":"<div><p>The demand to acquire improved efficiency, agility, and adaptability led to rapid evolution in mobile app development (MAD). Agile approaches are recognized for being cooperative and iterative, but there are still issues in handling a range of MAD necessities. The objective here is to blend the best practices of several prominent agile approaches and non-agile approaches to form an innovative and improved MAD approach, which we refer to as the improved Agile and Lean-based MAD Approach (iALBMAD), and this approach was the improved upon our previous work, ALBMAD. Here, three aspects of improvement concerning the discovery of suitable app attributes and best practices at various MAD activities and strengthening requirement gathering activities are exploited. For this to be accomplished, first we determined different app attributes that affect the MAD, agile and non-agile best practices, and machine learning (ML) functioning in MAD from the accessible literature. Now, we have equipped ALBMAD with all these gained aspects as per their applicability and offered it to 18 MAD experts to obtain suggestions for its improvement. Considering the experts’ opinions, a three-layered approach, namely, iALBMAD, was developed. In iALBMAD, automation and an iterative cycle are established to meet finished needs; these revisions may boost the quality of requirements and minimize time. Specific and experts validated best practices and app attributes suitable for each activity of iALBMAD are offered, which will assist less-skilled developers. Thirteen users verified the usability of six teams’ apps created using three different approaches, and the results show that the iALBMAD performs better than other approaches. The suggested approach and the discoveries will provide insightful information for individuals and MAD firms aiming to improve the way of MAD.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143930134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge-guided large language models are trustworthy API recommenders 知识引导的大型语言模型是值得信赖的API推荐器
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-07 DOI: 10.1007/s10515-025-00518-4
Hongwei Wei, Xiaohong Su, Weining Zheng, Wenxing Tao, Hailong Yu, Yuqian Kuang

Application Programming Interface (API) recommendation aims to recommend APIs for developers that meet their functional requirements, which can compensate for developers’ lack of API knowledge. In team-based software development, developers often need to implement functionality based on specific interface parameter types predefined by the software architect. Therefore, we propose API Recommendation under specific Interface Parameter Types (APIRIP), a special variant of the API recommendation task that requires the recommended APIs to conform to the interface parameter types. To realize APIRIP, we enlist the support of Large Language Models (LLMs). However, LLMs are susceptible to the phenomenon known as hallucination, wherein they may recommend untrustworthy API sequences. Instances of this include recommending fictitious APIs, APIs whose calling conditions cannot be satisfied, or API sequences that fail to conform to the interface parameter types. To mitigate these issues, we propose a Knowledge-guided framework for LLM-based API Recommendation (KG4LLM), which incorporates knowledge-guided data augmentation and beam search. The core idea of KG4LLM is to leverage API knowledge derived from the Java Development Kit (JDK) documentation to enhance the trustworthiness of LLM-generated recommendations. Experimental results demonstrate that KG4LLM can improve the trustworthiness of recommendation results provided by LLM and outperform advanced LLMs in the APIRIP task.

API (Application Programming Interface,应用程序编程接口)推荐旨在向开发人员推荐满足其功能需求的API,弥补开发人员对API知识的缺乏。在基于团队的软件开发中,开发人员通常需要基于软件架构师预定义的特定接口参数类型来实现功能。因此,我们提出了特定接口参数类型下的API推荐(APIRIP),这是API推荐任务的一个特殊变体,要求推荐的API符合接口参数类型。为了实现APIRIP,我们获得了大型语言模型(llm)的支持。然而,法学硕士易受幻觉现象的影响,他们可能会推荐不可靠的API序列。这方面的实例包括推荐虚构的API、无法满足调用条件的API,或者不符合接口参数类型的API序列。为了缓解这些问题,我们提出了一种知识导向的基于llm的API推荐框架(KG4LLM),该框架结合了知识导向的数据增强和束搜索。KG4LLM的核心思想是利用来自Java Development Kit (JDK)文档的API知识来增强llm生成的建议的可信度。实验结果表明,KG4LLM可以提高LLM提供的推荐结果的可信度,在APIRIP任务中优于高级LLM。
{"title":"Knowledge-guided large language models are trustworthy API recommenders","authors":"Hongwei Wei,&nbsp;Xiaohong Su,&nbsp;Weining Zheng,&nbsp;Wenxing Tao,&nbsp;Hailong Yu,&nbsp;Yuqian Kuang","doi":"10.1007/s10515-025-00518-4","DOIUrl":"10.1007/s10515-025-00518-4","url":null,"abstract":"<div><p><b>A</b>pplication <b>P</b>rogramming <b>I</b>nterface (API) recommendation aims to recommend APIs for developers that meet their functional requirements, which can compensate for developers’ lack of API knowledge. In team-based software development, developers often need to implement functionality based on specific interface parameter types predefined by the software architect. Therefore, we propose <b>API</b> <b>R</b>ecommendation under specific <b>I</b>nterface <b>P</b>arameter Types (APIRIP), a special variant of the API recommendation task that requires the recommended APIs to conform to the interface parameter types. To realize APIRIP, we enlist the support of <b>L</b>arge <b>L</b>anguage <b>M</b>odels (LLMs). However, LLMs are susceptible to the phenomenon known as hallucination, wherein they may recommend untrustworthy API sequences. Instances of this include recommending fictitious APIs, APIs whose calling conditions cannot be satisfied, or API sequences that fail to conform to the interface parameter types. To mitigate these issues, we propose a <b>K</b>nowledge-<b>g</b>uided framework <b>for</b> <b>LLM</b>-based API Recommendation (KG4LLM), which incorporates knowledge-guided data augmentation and beam search. The core idea of KG4LLM is to leverage API knowledge derived from the <b>J</b>ava <b>D</b>evelopment <b>K</b>it (JDK) documentation to enhance the trustworthiness of LLM-generated recommendations. Experimental results demonstrate that KG4LLM can improve the trustworthiness of recommendation results provided by LLM and outperform advanced LLMs in the APIRIP task.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparative study between android phone and TV apps android手机和电视应用程序的比较研究
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-05 DOI: 10.1007/s10515-025-00514-8
Yonghui Liu, Xiao Chen, Yue Liu, Pingfan Kong, Tegawendé F. Bissyandé, Jacques Klein, Xiaoyu Sun, Li Li, Chunyang Chen, John Grundy

Smart TVs have surged in popularity, leading developers to create TV versions of mobile apps. Understanding the relationship between TV and mobile apps is key to building consistent, secure, and optimized cross-platform experiences while addressing TV-specific SDK challenges. Despite extensive research on mobile apps, TV apps have been given little attention, leaving the relationship between phone and TV apps unexplored. Our study addresses this gap by compiling an extensive collection of 3445 Android phone/TV app pairs from the Google Play Store, launching the first comparative analysis of its kind. We examined these pairs across multiple dimensions, including non-code elements, code structure, security, and privacy aspects. Our findings reveal that while these app pairs could get identified with the same package names, they deploy different artifacts with varying functionality across platforms. TV apps generally exhibit less complexity in terms of hardware-dependent features and code volume but maintain significant shared resource files and components with their phone versions. Interestingly, some categories of TV apps show similar or even severe security and privacy concerns compared to their mobile counterparts. This research aims to assist developers and researchers in understanding phone-TV app relationships, highlight domain-specific concerns necessitating TV-specific tools, and provide insights for migrating apps from mobile to TV platforms.

智能电视越来越受欢迎,导致开发者开发了电视版的移动应用程序。理解电视和移动应用之间的关系是构建一致、安全、优化的跨平台体验的关键,同时解决电视特定SDK的挑战。尽管对移动应用程序进行了广泛的研究,但电视应用程序却很少受到关注,这使得手机和电视应用程序之间的关系尚未得到探索。我们的研究通过从b谷歌Play Store中收集3445对Android手机/电视应用程序来解决这一差距,并开展了首次同类比较分析。我们从多个方面考察了这些对,包括非代码元素、代码结构、安全性和隐私方面。我们的研究结果表明,虽然这些应用对可以被识别为相同的包名,但它们在不同的平台上部署了不同的工件,具有不同的功能。电视应用通常在硬件依赖特性和代码量方面表现得不那么复杂,但与手机版本保持重要的共享资源文件和组件。有趣的是,与手机应用程序相比,某些类别的电视应用程序显示出类似甚至严重的安全和隐私问题。本研究旨在帮助开发人员和研究人员理解手机和电视应用之间的关系,突出特定领域的问题,需要特定于电视的工具,并为将应用程序从移动平台迁移到电视平台提供见解。
{"title":"A comparative study between android phone and TV apps","authors":"Yonghui Liu,&nbsp;Xiao Chen,&nbsp;Yue Liu,&nbsp;Pingfan Kong,&nbsp;Tegawendé F. Bissyandé,&nbsp;Jacques Klein,&nbsp;Xiaoyu Sun,&nbsp;Li Li,&nbsp;Chunyang Chen,&nbsp;John Grundy","doi":"10.1007/s10515-025-00514-8","DOIUrl":"10.1007/s10515-025-00514-8","url":null,"abstract":"<div><p>Smart TVs have surged in popularity, leading developers to create TV versions of mobile apps. Understanding the relationship between TV and mobile apps is key to building consistent, secure, and optimized cross-platform experiences while addressing TV-specific SDK challenges. Despite extensive research on mobile apps, TV apps have been given little attention, leaving the relationship between phone and TV apps unexplored. Our study addresses this gap by compiling an extensive collection of 3445 Android phone/TV app pairs from the Google Play Store, launching the first comparative analysis of its kind. We examined these pairs across multiple dimensions, including non-code elements, code structure, security, and privacy aspects. Our findings reveal that while these app pairs could get identified with the same package names, they deploy different artifacts with varying functionality across platforms. TV apps generally exhibit less complexity in terms of hardware-dependent features and code volume but maintain significant shared resource files and components with their phone versions. Interestingly, some categories of TV apps show similar or even severe security and privacy concerns compared to their mobile counterparts. This research aims to assist developers and researchers in understanding phone-TV app relationships, highlight domain-specific concerns necessitating TV-specific tools, and provide insights for migrating apps from mobile to TV platforms.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143904672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving prompt tuning-based software vulnerability assessment by fusing source code and vulnerability description 通过融合源代码和漏洞描述,改进基于提示调优的软件漏洞评估
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-03 DOI: 10.1007/s10515-025-00525-5
Jiyu Wang, Xiang Chen, Wenlong Pei, Shaoyu Yang

To effectively allocate resources for vulnerability remediation, it is crucial to prioritize vulnerability fixes based on vulnerability severity. With the increasingnumber of vulnerabilities in recent years, there is an urgent need for automated methods for software vulnerability assessment (SVA). Most of the previous SVA studies mainly rely on traditional machine learning methods. Recently, fine-tuning pre-trained language models has emerged as an intuitive method for improving performance. However, there is a gap between pre-training and fine-tuning, and their performance heavily depends on the dataset’s quality of the downstream task. Therefore, we propose a prompt tuning-based method PT-SVA. Different from the fine-tuning paradigm, the prompt-tuning paradigm involves adding prompts to make the training process similar to pre-training, thereby better adapting to downstream tasks. Moreover, previous research aimed to automatically predict severity by only analyzing either the vulnerability descriptions or the source code of the vulnerability. Therefore, we further consider both types of vulnerability information for designing hybrid prompts (i.e., a combination of hard and soft prompts). To evaluate PT-SVA, we construct the SVA dataset based on the CVSS V3 standard, while previous SVA studies only consider the CVSS V2 standard. Experimental results show that PT-SVA outperforms ten state-of-the-art SVA baselines, such as by 13.7% to 42.1% in terms of MCC. Finally, our ablation experiments confirm the effectiveness of PT-SVA’s design, specifically in replacing fine-tuning with prompt tuning, incorporating both types of vulnerability information, and adopting hybrid prompts. Our promising results indicate that prompt tuning-based SVA is a promising direction and needs more follow-up studies.

为了有效地为漏洞修复分配资源,根据漏洞严重程度对漏洞修复进行优先排序是至关重要的。随着近年来软件漏洞数量的不断增加,对软件漏洞自动化评估方法的需求日益迫切。以往的SVA研究大多依赖于传统的机器学习方法。最近,微调预训练语言模型已经成为提高性能的一种直观方法。然而,预训练和微调之间存在差距,它们的性能在很大程度上取决于下游任务的数据集质量。因此,我们提出了一种基于即时调优的PT-SVA方法。与微调范式不同,提示调整范式包括添加提示,使训练过程类似于预训练,从而更好地适应下游任务。而且,以往的研究主要是通过分析漏洞描述或漏洞源代码来自动预测漏洞的严重程度。因此,我们进一步考虑这两种类型的漏洞信息来设计混合提示(即硬提示和软提示的组合)。为了评估PT-SVA,我们基于CVSS V3标准构建了SVA数据集,而以往的SVA研究只考虑CVSS V2标准。实验结果表明,PT-SVA优于10个最先进的SVA基线,例如在MCC方面高出13.7%至42.1%。最后,我们的烧消实验证实了PT-SVA设计的有效性,特别是在用提示调优取代微调,结合两种类型的漏洞信息以及采用混合提示方面。我们的研究结果表明,基于提示调谐的SVA是一个有希望的方向,需要更多的后续研究。
{"title":"Improving prompt tuning-based software vulnerability assessment by fusing source code and vulnerability description","authors":"Jiyu Wang,&nbsp;Xiang Chen,&nbsp;Wenlong Pei,&nbsp;Shaoyu Yang","doi":"10.1007/s10515-025-00525-5","DOIUrl":"10.1007/s10515-025-00525-5","url":null,"abstract":"<div><p>To effectively allocate resources for vulnerability remediation, it is crucial to prioritize vulnerability fixes based on vulnerability severity. With the increasingnumber of vulnerabilities in recent years, there is an urgent need for automated methods for software vulnerability assessment (SVA). Most of the previous SVA studies mainly rely on traditional machine learning methods. Recently, fine-tuning pre-trained language models has emerged as an intuitive method for improving performance. However, there is a gap between pre-training and fine-tuning, and their performance heavily depends on the dataset’s quality of the downstream task. Therefore, we propose a prompt tuning-based method PT-SVA. Different from the fine-tuning paradigm, the prompt-tuning paradigm involves adding prompts to make the training process similar to pre-training, thereby better adapting to downstream tasks. Moreover, previous research aimed to automatically predict severity by only analyzing either the vulnerability descriptions or the source code of the vulnerability. Therefore, we further consider both types of vulnerability information for designing hybrid prompts (i.e., a combination of hard and soft prompts). To evaluate PT-SVA, we construct the SVA dataset based on the CVSS V3 standard, while previous SVA studies only consider the CVSS V2 standard. Experimental results show that PT-SVA outperforms ten state-of-the-art SVA baselines, such as by 13.7% to 42.1% in terms of MCC. Finally, our ablation experiments confirm the effectiveness of PT-SVA’s design, specifically in replacing fine-tuning with prompt tuning, incorporating both types of vulnerability information, and adopting hybrid prompts. Our promising results indicate that prompt tuning-based SVA is a promising direction and needs more follow-up studies.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143900652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A systematic mapping study on automated negotiation for autonomous intelligent systems 自主智能系统自动协商的系统映射研究
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-02 DOI: 10.1007/s10515-025-00515-7
Mashal Afzal Memon, Gian Luca Scoccia, Marco Autili

Autonomous intelligent systems are known as artificial intelligence software entities that can act on their own and can take decisions without any human intervention. The communication between such systems to reach an agreement for problem-solving is known as automated negotiation. This study aims to systematically identify and analyze the literature on automated negotiation from four distinct viewpoints: (1) the existing literature on negotiation with focus on automation, (2) the specific purpose and application domain of the studies published in the domain of automated negotiation, (3) the input, and techniques used to model the negotiation process, and (4) the limitations of the state of the art and future research directions. For this purpose, we performed a systematic mapping study (SMS) starting from 73,760 potentially relevant studies belonging to 24 conference proceedings and 22 journal issues. Through a precise selection procedure, we identified 50 primary studies, published from the year 2000 onward, which were analyzed by applying a classification framework. As a result, we provide: (a) a classification framework to analyze the automated negotiation literature according to several parameters (e.g., focus of the paper, inputs required to carry on the negotiation process, techniques applied, and type of agents involved in the negotiation), (b) an up-to-date map of the literature specifying the purpose and application domain of each study, (c) a list of techniques used to automate the negotiation process and the list of input to carry out the negotiation, and (d) a discussion about promising challenges and their consequences for future research. We also provide a replication package to help researchers replicate and verify our systematic mapping study. The results and findings will benefit researchers and practitioners in identifying the research gap and conducting further research to bring dedicated solutions for automated negotiation.

自主智能系统被称为人工智能软件实体,可以自行行动,可以在没有任何人为干预的情况下做出决定。这些系统之间为达成解决问题的协议而进行的通信称为自动协商。本研究旨在从四个不同的角度系统地识别和分析自动化谈判的文献:(1)现有的以自动化为重点的谈判文献;(2)在自动化谈判领域发表的研究的具体目的和应用领域;(3)用于谈判过程建模的输入和技术;(4)技术现状的局限性和未来的研究方向。为此,我们进行了一项系统的地图研究(SMS),从24个会议论文集和22个期刊的73,760项潜在相关研究开始。通过精确的选择程序,我们确定了自2000年以来发表的50项主要研究,并通过应用分类框架对其进行分析。因此,我们提供:(a)根据几个参数(例如,论文的焦点、进行谈判过程所需的输入、应用的技术和谈判中涉及的代理人类型)分析自动化谈判文献的分类框架,(b)详细说明每项研究的目的和应用领域的最新文献地图,(c)用于自动化谈判过程的技术清单和执行谈判的输入清单,(d)讨论有希望的挑战及其对未来研究的影响。我们还提供了一个复制包,以帮助研究人员复制和验证我们的系统测绘研究。研究结果和发现将有利于研究人员和从业人员确定研究差距,并开展进一步的研究,为自动谈判提供专用解决方案。
{"title":"A systematic mapping study on automated negotiation for autonomous intelligent systems","authors":"Mashal Afzal Memon,&nbsp;Gian Luca Scoccia,&nbsp;Marco Autili","doi":"10.1007/s10515-025-00515-7","DOIUrl":"10.1007/s10515-025-00515-7","url":null,"abstract":"<div><p>Autonomous intelligent systems are known as artificial intelligence software entities that can act on their own and can take decisions without any human intervention. The communication between such systems to reach an agreement for problem-solving is known as automated negotiation. This study aims to systematically identify and analyze the literature on automated negotiation from four distinct viewpoints: (1) the existing literature on negotiation with focus on automation, (2) the specific purpose and application domain of the studies published in the domain of automated negotiation, (3) the input, and techniques used to model the negotiation process, and (4) the limitations of the state of the art and future research directions. For this purpose, we performed a systematic mapping study (SMS) starting from 73,760 potentially relevant studies belonging to 24 conference proceedings and 22 journal issues. Through a precise selection procedure, we identified 50 primary studies, published from the year 2000 onward, which were analyzed by applying a classification framework. As a result, we provide: (a) a classification framework to analyze the automated negotiation literature according to several parameters (e.g., focus of the paper, inputs required to carry on the negotiation process, techniques applied, and type of agents involved in the negotiation), (b) an up-to-date map of the literature specifying the purpose and application domain of each study, (c) a list of techniques used to automate the negotiation process and the list of input to carry out the negotiation, and (d) a discussion about promising challenges and their consequences for future research. We also provide a replication package to help researchers replicate and verify our systematic mapping study. The results and findings will benefit researchers and practitioners in identifying the research gap and conducting further research to bring dedicated solutions for automated negotiation.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00515-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ExtRep: a GUI test repair method for mobile applications based on test-extension 基于test-extension的移动应用GUI测试修复方法
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-04-25 DOI: 10.1007/s10515-025-00513-9
Yonghao Long, Yuanyuan Chen, Chu Zeng, Xiangping Chen, Xing Chen, Xiaocong Zhou, Jingru Yang, Gang Huang, Zibin Zheng

GUI testing ensures the software quality and user experience in the ever-changing mobile application development. Using test scripts is one of the main GUI testing manner, but it might be obsolete when the GUI changes with the app’s evolution. Current studies often rely on textual or visual similarity to perform test repair, but may be less effective when the interacted event sequence changes dramatically. In the interaction design, practitioners often provide multiple entry points to access the same function to gain higher openness and flexibility, which indicates that there may be multiple routes for reference in test repair. To evaluate the feasibility, we first conducted an exploratory study on 37 tests from 18 apps. The result showed that over 81% tests could be represented with alternative event paths, and using the extended paths could help enhance the test replay rate. Based on this finding, we propose a test-extension-based test repair algorithm named ExtRep. The method first uses test-extension to find alternative paths with similar test objectives based on feature coverage, and then finds repaired result with the help of sequence transduction probability proposed in NLP area. Experiments conducted on 40 popular applications demonstrate that ExtRep can achieve a success rate of 73.68% in repairing 97 tests, which significantly outperforms current approaches Water, Meter, and Guider. Moreover, the test-extension approach displays immense potential for optimizing test repairs. A tool that implements the ExtRep is available for practical use and future research.

在不断变化的移动应用开发中,GUI测试确保了软件质量和用户体验。使用测试脚本是主要的GUI测试方式之一,但当GUI随着应用程序的发展而变化时,它可能已经过时了。目前的研究往往依赖于文本或视觉相似性来进行测试修复,但当相互作用的事件序列发生显著变化时,可能效果较差。在交互设计中,从业者通常会提供多个入口点来访问同一个功能,以获得更高的开放性和灵活性,这表明在测试修复中可能会有多个路径可供参考。为了评估可行性,我们首先对18个应用程序的37个测试进行了探索性研究。结果表明,81%以上的测试可以用备选事件路径表示,使用扩展路径可以提高测试重放率。基于这一发现,我们提出了一种基于测试扩展的测试修复算法,命名为ExtRep。该方法首先利用基于特征覆盖率的测试扩展来寻找具有相似测试目标的备选路径,然后利用NLP区域中提出的序列转导概率来寻找修复结果。在40个流行的应用中进行的实验表明,在修复97个测试中,ExtRep的成功率为73.68%,显著优于目前的方法Water, Meter和Guider。此外,测试扩展方法显示了优化测试修复的巨大潜力。一个工具,实现极端是可用于实际使用和未来的研究。
{"title":"ExtRep: a GUI test repair method for mobile applications based on test-extension","authors":"Yonghao Long,&nbsp;Yuanyuan Chen,&nbsp;Chu Zeng,&nbsp;Xiangping Chen,&nbsp;Xing Chen,&nbsp;Xiaocong Zhou,&nbsp;Jingru Yang,&nbsp;Gang Huang,&nbsp;Zibin Zheng","doi":"10.1007/s10515-025-00513-9","DOIUrl":"10.1007/s10515-025-00513-9","url":null,"abstract":"<div><p>GUI testing ensures the software quality and user experience in the ever-changing mobile application development. Using test scripts is one of the main GUI testing manner, but it might be obsolete when the GUI changes with the app’s evolution. Current studies often rely on textual or visual similarity to perform test repair, but may be less effective when the interacted event sequence changes dramatically. In the interaction design, practitioners often provide multiple entry points to access the same function to gain higher openness and flexibility, which indicates that there may be multiple routes for reference in test repair. To evaluate the feasibility, we first conducted an exploratory study on 37 tests from 18 apps. The result showed that over 81% tests could be represented with alternative event paths, and using the extended paths could help enhance the test replay rate. Based on this finding, we propose a test-<b>ext</b>ension-based test <b>rep</b>air algorithm named <i>ExtRep</i>. The method first uses test-extension to find alternative paths with similar test objectives based on feature coverage, and then finds repaired result with the help of sequence transduction probability proposed in NLP area. Experiments conducted on 40 popular applications demonstrate that <i>ExtRep</i> can achieve a success rate of 73.68% in repairing 97 tests, which significantly outperforms current approaches <span>Water</span>, <span>Meter</span>, and <span>Guider</span>. Moreover, the test-extension approach displays immense potential for optimizing test repairs. A tool that implements the <i>ExtRep</i> is available for practical use and future research.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Automated Software Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1