首页 > 最新文献

Automated Software Engineering最新文献

英文 中文
Tab: template-aware bug report title generation via two-phase fine-tuned models 选项卡:通过两阶段微调模型生成模板感知的bug报告标题
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-03-22 DOI: 10.1007/s10515-025-00505-9
Xiao Liu, Yinkang Xu, Weifeng Sun, Naiqi Huang, Song Sun, Qiang Li, Dan Yang, Meng Yan

Bug reports play a critical role in the software development lifecycle by helping developers identify and resolve defects efficiently. However, the quality of bug report titles, particularly in open-source communities, can vary significantly, which complicates the bug triage and resolution processes. Existing approaches, such as iTAPE, treat title generation as a one-sentence summarization task using sequence-to-sequence models. While these methods show promise, they face two major limitations: (1) they do not consider the distinct components of bug reports, treating the entire report as a homogeneous input, and (2) they struggle to handle the variability between template-based and non-template-based reports, often resulting in suboptimal titles. To address these limitations, we propose TAB, a hybrid framework that combines a Document Component Analyzer based on a pre-trained BERT model and a Title Generation Model based on CodeT5. TAB addresses the first limitation by segmenting bug reports into four components-Description, Reproduction, Expected Behavior, and Others-to ensure better alignment between input and output. For the second limitation, TAB uses a divergent approach: for template-based reports, titles are generated directly, while for non-template reports, DCA extracts key components to improve title relevance and clarity. We evaluate TAB on both template-based and non-template-based bug reports, demonstrating that it significantly outperforms existing methods. Specifically, TAB achieves average improvements of 170.4–389.5% in METEOR, 67.8–190.0% in ROUGE-L, and 65.7–124.5% in chrF(AF) compared to baseline approaches on template-based reports. Additionally, on non-template-based reports, TAB shows an average improvement of 64% in METEOR, 3.6% in ROUGE-L, and 14.8% in chrF(AF) over the state-of-the-art. These results confirm the robustness of TAB in generating high-quality titles across diverse bug report formats.

Bug报告通过帮助开发人员有效地识别和解决缺陷,在软件开发生命周期中扮演着至关重要的角色。然而,bug报告标题的质量,特别是在开源社区中,可能会有很大的差异,这使得bug分类和解决过程变得复杂。现有的方法,如iTAPE,将标题生成视为使用序列到序列模型的一句话摘要任务。虽然这些方法显示出希望,但它们面临两个主要的限制:(1)它们不考虑bug报告的不同组成部分,将整个报告视为同质输入,(2)它们难以处理基于模板和非基于模板的报告之间的可变性,经常导致次优标题。为了解决这些限制,我们提出了TAB,这是一个混合框架,它结合了基于预训练BERT模型的文档组件分析器和基于CodeT5的标题生成模型。TAB通过将bug报告划分为四个组件(描述、再现、预期行为和其他)来解决第一个限制,以确保输入和输出之间更好的一致性。对于第二个限制,TAB使用了一种不同的方法:对于基于模板的报告,直接生成标题,而对于非模板报告,DCA提取关键组件以提高标题的相关性和清晰度。我们在基于模板和非基于模板的bug报告上对TAB进行了评估,证明它明显优于现有的方法。具体来说,与基于模板的报告的基线方法相比,TAB在METEOR中实现了170.4-389.5%的平均改进,在ROUGE-L中实现了67.8-190.0%,在chrF(AF)中实现了65.7-124.5%。此外,在非基于模板的报告中,TAB显示METEOR的平均改进为64%,ROUGE-L的平均改进为3.6%,chrF(AF)的平均改进为14.8%。这些结果证实了TAB在跨各种错误报告格式生成高质量标题方面的稳健性。
{"title":"Tab: template-aware bug report title generation via two-phase fine-tuned models","authors":"Xiao Liu,&nbsp;Yinkang Xu,&nbsp;Weifeng Sun,&nbsp;Naiqi Huang,&nbsp;Song Sun,&nbsp;Qiang Li,&nbsp;Dan Yang,&nbsp;Meng Yan","doi":"10.1007/s10515-025-00505-9","DOIUrl":"10.1007/s10515-025-00505-9","url":null,"abstract":"<div><p>Bug reports play a critical role in the software development lifecycle by helping developers identify and resolve defects efficiently. However, the quality of bug report titles, particularly in open-source communities, can vary significantly, which complicates the bug triage and resolution processes. Existing approaches, such as iTAPE, treat title generation as a one-sentence summarization task using sequence-to-sequence models. While these methods show promise, they face two major limitations: (1) they do not consider the distinct components of bug reports, treating the entire report as a homogeneous input, and (2) they struggle to handle the variability between template-based and non-template-based reports, often resulting in suboptimal titles. To address these limitations, we propose <span>TAB</span>, a hybrid framework that combines a <i>Document Component Analyzer</i> based on a pre-trained BERT model and a <i>Title Generation Model</i> based on CodeT5. <span>TAB</span> addresses the first limitation by segmenting bug reports into four components-<i>Description</i>, <i>Reproduction</i>, <i>Expected Behavior</i>, and <i>Others</i>-to ensure better alignment between input and output. For the second limitation, <span>TAB</span> uses a divergent approach: for template-based reports, titles are generated directly, while for non-template reports, DCA extracts key components to improve title relevance and clarity. We evaluate <span>TAB</span> on both template-based and non-template-based bug reports, demonstrating that it significantly outperforms existing methods. Specifically, <span>TAB</span> achieves average improvements of 170.4–389.5% in METEOR, 67.8–190.0% in ROUGE-L, and 65.7–124.5% in chrF(AF) compared to baseline approaches on template-based reports. Additionally, on non-template-based reports, <span>TAB</span> shows an average improvement of 64% in METEOR, 3.6% in ROUGE-L, and 14.8% in chrF(AF) over the state-of-the-art. These results confirm the robustness of <span>TAB</span> in generating high-quality titles across diverse bug report formats.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143668190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reinforcement learning for mutation operator selection in automated program repair 自动程序修复中突变算子选择的强化学习
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-03-15 DOI: 10.1007/s10515-025-00501-z
Carol Hanna, Aymeric Blot, Justyna Petke

Automated program repair techniques aim to aid software developers with the challenging task of fixing bugs. In heuristic-based program repair, a search space of mutated program variants is explored to find potential patches for bugs. Most commonly, every selection of a mutation operator during search is performed uniformly at random, which can generate many buggy, even uncompilable programs. Our goal is to reduce the generation of variants that do not compile or break intended functionality which waste considerable resources. In this paper, we investigate the feasibility of a reinforcement learning-based approach for the selection of mutation operators in heuristic-based program repair. Our proposed approach is programming language, granularity-level, and search strategy agnostic and allows for easy augmentation into existing heuristic-based repair tools. We conducted an extensive empirical evaluation of four operator selection techniques, two reward types, two credit assignment strategies, two integration methods, and three sets of mutation operators using 30,080 independent repair attempts. We evaluated our approach on 353 real-world bugs from the Defects4J benchmark. The reinforcement learning-based mutation operator selection results in a higher number of test-passing variants, but does not exhibit a noticeable improvement in the number of bugs patched in comparison with the baseline, uniform random selection. While reinforcement learning has been previously shown to be successful in improving the search of evolutionary algorithms, often used in heuristic-based program repair, it has yet to demonstrate such improvements when applied to this area of research.

自动程序修复技术旨在帮助软件开发人员完成修复错误的挑战性任务。在基于启发式的程序修复中,探索了一个突变程序变体的搜索空间,以找到潜在的漏洞补丁。最常见的是,在搜索过程中,每个突变操作符的选择都是均匀随机执行的,这可能产生许多错误,甚至不可编译的程序。我们的目标是减少不能编译或破坏预期功能的变体的生成,这会浪费大量资源。在本文中,我们研究了一种基于强化学习的方法在基于启发式的程序修复中选择突变算子的可行性。我们提出的方法是编程语言、粒度级和搜索策略不可知的,并且允许轻松地扩展到现有的基于启发式的修复工具中。我们利用30,080次独立修复尝试,对四种算子选择技术、两种奖励类型、两种信用分配策略、两种整合方法和三组突变算子进行了广泛的实证评估。我们对来自缺陷4j基准测试的353个实际错误评估了我们的方法。基于强化学习的突变操作符选择导致更多的通过测试的变体,但与基线一致的随机选择相比,在修补错误的数量上没有显着改善。虽然强化学习之前已经被证明在改进进化算法的搜索方面是成功的,通常用于基于启发式的程序修复,但当应用于这一研究领域时,它还没有证明这种改进。
{"title":"Reinforcement learning for mutation operator selection in automated program repair","authors":"Carol Hanna,&nbsp;Aymeric Blot,&nbsp;Justyna Petke","doi":"10.1007/s10515-025-00501-z","DOIUrl":"10.1007/s10515-025-00501-z","url":null,"abstract":"<div><p>Automated program repair techniques aim to aid software developers with the challenging task of fixing bugs. In heuristic-based program repair, a search space of mutated program variants is explored to find potential patches for bugs. Most commonly, every selection of a mutation operator during search is performed uniformly at random, which can generate many buggy, even uncompilable programs. Our goal is to reduce the generation of variants that do not compile or break intended functionality which waste considerable resources. In this paper, we investigate the feasibility of a reinforcement learning-based approach for the selection of mutation operators in heuristic-based program repair. Our proposed approach is programming language, granularity-level, and search strategy agnostic and allows for easy augmentation into existing heuristic-based repair tools. We conducted an extensive empirical evaluation of four operator selection techniques, two reward types, two credit assignment strategies, two integration methods, and three sets of mutation operators using 30,080 independent repair attempts. We evaluated our approach on 353 real-world bugs from the Defects4J benchmark. The reinforcement learning-based mutation operator selection results in a higher number of test-passing variants, but does not exhibit a noticeable improvement in the number of bugs patched in comparison with the baseline, uniform random selection. While reinforcement learning has been previously shown to be successful in improving the search of evolutionary algorithms, often used in heuristic-based program repair, it has yet to demonstrate such improvements when applied to this area of research.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00501-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A study on prompt design, advantages and limitations of ChatGPT for deep learning program repair ChatGPT用于深度学习程序修复的提示设计、优势与局限性研究
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-03-07 DOI: 10.1007/s10515-025-00492-x
Jialun Cao, Meiziniu Li, Ming Wen, Shing-Chi Cheung

The emergence of large language models (LLMs) such as ChatGPT has revolutionized many fields. In particular, recent advances in LLMs have triggered various studies examining the use of these models for software development tasks, such as program repair, code understanding, and code generation. Prior studies have shown the capability of ChatGPT in repairing conventional programs. However, debugging deep learning (DL) programs poses unique challenges since the decision logic is not directly encoded in the source code. This requires LLMs to not only parse the source code syntactically but also understand the intention of DL programs. Therefore, ChatGPT’s capability in repairing DL programs remains unknown. To fill this gap, our study aims to answer three research questions: (1) Can ChatGPT debug DL programs effectively? (2) How can ChatGPT’s repair performance be improved by prompting? (3) In which way can dialogue help facilitate the repair? Our study analyzes the typical information that is useful for prompt design and suggests enhanced prompt templates that are more efficient for repairing DL programs. On top of them, we summarize the dual perspectives (i.e., advantages and disadvantages) of ChatGPT’s ability, such as its handling of API misuse and recommendation, and its shortcomings in identifying default parameters. Our findings indicate that ChatGPT has the potential to repair DL programs effectively and that prompt engineering and dialogue can further improve its performance by providing more code intention. We also identified the key intentions that can enhance ChatGPT’s program repairing capability.

像ChatGPT这样的大型语言模型(llm)的出现已经彻底改变了许多领域。特别是,法学硕士的最新进展已经引发了各种研究,检查这些模型在软件开发任务中的使用,例如程序修复、代码理解和代码生成。先前的研究已经表明ChatGPT在修复常规程序方面的能力。然而,调试深度学习(DL)程序带来了独特的挑战,因为决策逻辑没有直接编码在源代码中。这就要求llm不仅要从语法上解析源代码,还要理解DL程序的意图。因此,ChatGPT在修复DL程序方面的能力仍然未知。为了填补这一空白,我们的研究旨在回答三个研究问题:(1)ChatGPT能有效地调试DL程序吗?(2)如何通过提示提高ChatGPT的修复性能?(3)对话在哪些方面有助于促进修复?我们的研究分析了对提示设计有用的典型信息,并提出了更有效地修复DL程序的增强提示模板。在此基础上,我们总结了ChatGPT能力的双重视角(即优点和缺点),例如它对API滥用和推荐的处理,以及它在识别默认参数方面的缺点。我们的研究结果表明,ChatGPT具有有效修复深度学习程序的潜力,并且通过提供更多的代码意图,提示工程和对话可以进一步提高其性能。我们还确定了可以增强ChatGPT程序修复能力的关键意图。
{"title":"A study on prompt design, advantages and limitations of ChatGPT for deep learning program repair","authors":"Jialun Cao,&nbsp;Meiziniu Li,&nbsp;Ming Wen,&nbsp;Shing-Chi Cheung","doi":"10.1007/s10515-025-00492-x","DOIUrl":"10.1007/s10515-025-00492-x","url":null,"abstract":"<div><p>The emergence of large language models (LLMs) such as ChatGPT has revolutionized many fields. In particular, recent advances in LLMs have triggered various studies examining the use of these models for software development tasks, such as program repair, code understanding, and code generation. Prior studies have shown the capability of ChatGPT in repairing conventional programs. However, debugging deep learning (DL) programs poses unique challenges since the decision logic is not directly encoded in the source code. This requires LLMs to not only parse the source code syntactically but also understand the intention of DL programs. Therefore, ChatGPT’s capability in repairing DL programs remains unknown. To fill this gap, our study aims to answer three research questions: (1) Can ChatGPT debug DL programs effectively? (2) How can ChatGPT’s repair performance be improved by prompting? (3) In which way can dialogue help facilitate the repair? Our study analyzes the typical information that is useful for prompt design and suggests enhanced prompt templates that are more efficient for repairing DL programs. On top of them, we summarize the dual perspectives (i.e., advantages and disadvantages) of ChatGPT’s ability, such as its handling of API misuse and recommendation, and its shortcomings in identifying default parameters. Our findings indicate that ChatGPT has the potential to repair DL programs effectively and that prompt engineering and dialogue can further improve its performance by providing more code intention. We also identified the key intentions that can enhance ChatGPT’s program repairing capability.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00492-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143564412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BallPri: test cases prioritization for deep neuron networks via tolerant ball in variable space BallPri:测试用例优先级的深度神经元网络通过容忍球在可变空间
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-03-06 DOI: 10.1007/s10515-025-00498-5
Chengyu Jia, Jinyin Chen, Xiaohao Li, Haibin Zheng, Luxin Zhang

Deep neural networks (DNNs) have gained widespread adoption in various applications, including some safety-critical domains such as autonomous driving. However, despite their impressive capabilities and outstanding performance, DNNs could also exhibit incorrect behaviors that may lead to serious accidents. As a result, it requires security assurance urgently when applied to safety-critical applications. Deep testing has been developed as an effective technique for detecting incorrectness in DNN behaviors and improving their robustness when necessary, but it needs a large amount of labeled test cases that are expensive to obtain due to the labor-intensive data labeling process. Test case prioritization has been proposed to identify more error-exposed test cases earlier in advance, and several techniques such as DeepGini and PRIMA have been developed that achieve effective and efficient prioritization for classification tasks. However, these methods still face challenges such as unreliable validity, limited application scenarios, and high time complexity. To tackle these issues, we present a novel test prioritization method BallPri by using tolerant ball in variable space for DNNs. It extracts tolerant ball of different test cases and use minimum non-parametric likelihood ratio (MinLR) to further enlarge the difference of distribution in variable space, to achieve effective and general test cases prioritizing. Extensive experiments on benchmark datasets and models validate that BallPri outperforms the state-of-the-art methods in three key aspects: (1) Effective—it leverages tolerant ball in variable space to identify malicious bug-revealing inputs. BallPri significantly improves 47.83% prioritization effectiveness and 37.27% prioritization efficiency on average compared with baselines. (2) Extensible—it can be applied to various tasks, data and models. We verify the superiority of BallPri on classification and regression task, convolutional neural network and recurrent neural network model, image, text and speech dataset. (3) Efficient—it achieves a low time complexity compared with existing methods. We further evaluate BallPri against potential adaptive attacks and provide guidance for its accuracy and robustness. The open-source code of BallPri could be downloaded at https://github.com/lixiaohaao/BallPri.

深度神经网络(dnn)在各种应用中得到了广泛的应用,包括一些安全关键领域,如自动驾驶。然而,尽管具有令人印象深刻的能力和出色的性能,深度神经网络也可能表现出可能导致严重事故的不正确行为。因此,当应用于安全关键型应用时,迫切需要安全保障。深度测试已经发展成为一种有效的技术,用于检测深度神经网络行为中的不正确性,并在必要时提高其鲁棒性,但它需要大量标记的测试用例,由于劳动密集型的数据标记过程,获得这些用例的成本很高。测试用例优先级已经被提出,以提前更早地识别更多暴露错误的测试用例,并且已经开发了几种技术,如DeepGini和PRIMA,可以实现有效和高效的分类任务优先级。然而,这些方法仍然面临着有效性不可靠、应用场景有限、时间复杂度高等问题。为了解决这些问题,我们提出了一种新的测试优先级排序方法BallPri,该方法通过在变量空间中对dnn使用容差球。提取不同测试用例的容差球,利用最小非参数似然比(MinLR)进一步扩大变量空间中分布的差异,实现有效、通用的测试用例优先排序。在基准数据集和模型上进行的大量实验验证了BallPri在三个关键方面优于最先进的方法:(1)有效——它利用可变空间中的容差球来识别恶意漏洞的输入。与基线相比,BallPri显著提高了47.83%的优先级效率和37.27%的平均优先级效率。(2)可扩展性——可适用于各种任务、数据和模型。我们验证了BallPri在分类和回归任务、卷积神经网络和循环神经网络模型、图像、文本和语音数据集方面的优势。(3)高效——与现有方法相比,时间复杂度较低。我们进一步评估了BallPri对抗潜在的自适应攻击,并为其准确性和鲁棒性提供了指导。BallPri的开源代码可在https://github.com/lixiaohaao/BallPri下载。
{"title":"BallPri: test cases prioritization for deep neuron networks via tolerant ball in variable space","authors":"Chengyu Jia,&nbsp;Jinyin Chen,&nbsp;Xiaohao Li,&nbsp;Haibin Zheng,&nbsp;Luxin Zhang","doi":"10.1007/s10515-025-00498-5","DOIUrl":"10.1007/s10515-025-00498-5","url":null,"abstract":"<div><p>Deep neural networks (DNNs) have gained widespread adoption in various applications, including some safety-critical domains such as autonomous driving. However, despite their impressive capabilities and outstanding performance, DNNs could also exhibit incorrect behaviors that may lead to serious accidents. As a result, it requires security assurance urgently when applied to safety-critical applications. Deep testing has been developed as an effective technique for detecting incorrectness in DNN behaviors and improving their robustness when necessary, but it needs a large amount of labeled test cases that are expensive to obtain due to the labor-intensive data labeling process. Test case prioritization has been proposed to identify more error-exposed test cases earlier in advance, and several techniques such as DeepGini and PRIMA have been developed that achieve effective and efficient prioritization for classification tasks. However, these methods still face challenges such as unreliable validity, limited application scenarios, and high time complexity. To tackle these issues, we present a novel test prioritization method <i>BallPri</i> by using tolerant ball in variable space for DNNs. It extracts tolerant ball of different test cases and use minimum non-parametric likelihood ratio (MinLR) to further enlarge the difference of distribution in variable space, to achieve effective and general test cases prioritizing. Extensive experiments on benchmark datasets and models validate that <i>BallPri</i> outperforms the state-of-the-art methods in three key aspects: (1) <i>Effective</i>—it leverages tolerant ball in variable space to identify malicious bug-revealing inputs. <i>BallPri</i> significantly improves 47.83% prioritization effectiveness and 37.27% prioritization efficiency on average compared with baselines. (2) <i>Extensible</i>—it can be applied to various tasks, data and models. We verify the superiority of <i>BallPri</i> on classification and regression task, convolutional neural network and recurrent neural network model, image, text and speech dataset. (3) <i>Efficient</i>—it achieves a low time complexity compared with existing methods. We further evaluate <i>BallPri</i> against potential adaptive attacks and provide guidance for its accuracy and robustness. The open-source code of <i>BallPri</i> could be downloaded at https://github.com/lixiaohaao/BallPri.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143554080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bash command comment generation via multi-scale heterogeneous feature fusion 多尺度异构特征融合生成Bash命令注释
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-03-04 DOI: 10.1007/s10515-025-00494-9
Junsan Zhang, Yang Zhu, Ao Lu, Yudie Yan, Yao Wan

Automatic generation of Bash command comments is crucial for understanding and updating commands in software maintenance. Existing mainstream methods mainly focus on learning from the sequential text of Bash commands and combining retrieval-enhanced techniques to generate comments. However, these methods overlook the syntactic structure of Bash commands, thereby limiting the quality and accuracy of generated comments. This paper proposes a heterogeneous Bash comment generation framework named HBCom, which is aimed at deeply exploring the semantic information of Bash commands from command token sequences and syntactic structures to generate more accurate and natural command comments. The core of HBCom lies in constructing a Heterogeneous Information Graph (HIG) based on an Abstract Syntax Tree, which integrates the syntactic structure of Bash commands with the code sequence through six types of edges, providing a solid information basis for subsequent comment generation. In addition, we propose a heterogeneous and multi-scale graph neural network to capture various relationships in HIGs. Subsequently, we utilize a Transformer decoder, combined with a copy mechanism based on multi-head attention, to decode and fuse the HIG and Bash command tokens features, ultimately generating high-quality comments. We conduct extensive experiments on Bash dataset, demonstrating that HBCom outperforms compared baseline models in BLEU, ROUGE-L, and METEOR metrics. Furthermore, human evaluations confirm HBCom’s effectiveness in real-world application scenarios.

在软件维护中,Bash命令注释的自动生成对于理解和更新命令至关重要。现有的主流方法主要侧重于从Bash命令的顺序文本中学习,并结合检索增强技术来生成注释。但是,这些方法忽略了Bash命令的语法结构,从而限制了生成注释的质量和准确性。本文提出了一种异构的Bash注释生成框架HBCom,该框架旨在从命令令牌序列和语法结构中深入挖掘Bash命令的语义信息,生成更准确、更自然的命令注释。HBCom的核心是构建基于抽象语法树的异构信息图(HIG),通过六种边将Bash命令的语法结构与代码序列相结合,为后续的注释生成提供了坚实的信息基础。此外,我们提出了一个异构和多尺度的图神经网络来捕捉HIGs中的各种关系。随后,我们利用Transformer解码器,结合基于多头注意的复制机制,解码并融合HIG和Bash命令令牌特性,最终生成高质量的注释。我们在Bash数据集上进行了大量的实验,证明HBCom在BLEU、ROUGE-L和METEOR指标上优于基线模型。此外,人类评估证实了HBCom在实际应用场景中的有效性。
{"title":"Bash command comment generation via multi-scale heterogeneous feature fusion","authors":"Junsan Zhang,&nbsp;Yang Zhu,&nbsp;Ao Lu,&nbsp;Yudie Yan,&nbsp;Yao Wan","doi":"10.1007/s10515-025-00494-9","DOIUrl":"10.1007/s10515-025-00494-9","url":null,"abstract":"<div><p>Automatic generation of Bash command comments is crucial for understanding and updating commands in software maintenance. Existing mainstream methods mainly focus on learning from the sequential text of Bash commands and combining retrieval-enhanced techniques to generate comments. However, these methods overlook the syntactic structure of Bash commands, thereby limiting the quality and accuracy of generated comments. This paper proposes a heterogeneous Bash comment generation framework named HBCom, which is aimed at deeply exploring the semantic information of Bash commands from command token sequences and syntactic structures to generate more accurate and natural command comments. The core of HBCom lies in constructing a Heterogeneous Information Graph (HIG) based on an Abstract Syntax Tree, which integrates the syntactic structure of Bash commands with the code sequence through six types of edges, providing a solid information basis for subsequent comment generation. In addition, we propose a heterogeneous and multi-scale graph neural network to capture various relationships in HIGs. Subsequently, we utilize a Transformer decoder, combined with a copy mechanism based on multi-head attention, to decode and fuse the HIG and Bash command tokens features, ultimately generating high-quality comments. We conduct extensive experiments on Bash dataset, demonstrating that HBCom outperforms compared baseline models in BLEU, ROUGE-L, and METEOR metrics. Furthermore, human evaluations confirm HBCom’s effectiveness in real-world application scenarios.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143554026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantum software engineering and potential of quantum computing in software engineering research: a review 量子软件工程和量子计算在软件工程研究中的潜力:综述
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-03-02 DOI: 10.1007/s10515-025-00493-w
Ashis Kumar Mandal, Md Nadim, Chanchal K. Roy, Banani Roy, Kevin A. Schneider

Research in software engineering is essential for improving software development practices, leading to reliable and secure software. Leveraging the principles of quantum physics, quantum computing has emerged as a new computational paradigm that offers significant advantages over classical computing. As quantum computing progresses rapidly, its potential applications across various fields are becoming apparent. In software engineering, many tasks involve complex computations where quantum computers can greatly speed up the development process, leading to faster and more efficient solutions. With the growing use of quantum-based applications in different fields, Quantum Software Engineering (QSE) has emerged as a discipline focused on designing, developing, and optimizing quantum software for diverse applications. This paper aims to review the role of quantum computing in software engineering research and the latest developments in QSE. To our knowledge, this is the first comprehensive review on this topic. We begin by introducing quantum computing, exploring its fundamental concepts, and discussing its potential applications in software engineering. We also examine various QSE techniques that expedite software development. Finally, we discuss the opportunities and challenges in quantum-driven software engineering and QSE. Our study reveals that quantum machine learning and quantum optimization have substantial potential to address classical software engineering tasks, though this area is still limited. Current QSE tools and techniques lack robustness and maturity, indicating a need for more focus. One of the main challenges is that quantum computing has yet to reach its full potential.

软件工程研究对于改进软件开发实践、开发可靠安全的软件至关重要。利用量子物理学原理,量子计算已成为一种新的计算范式,与经典计算相比具有显著优势。随着量子计算的快速发展,其在各个领域的潜在应用也逐渐显现出来。在软件工程领域,许多任务都涉及复杂的计算,而量子计算机可以大大加快开发过程,带来更快、更高效的解决方案。随着基于量子的应用在不同领域的使用日益增多,量子软件工程(QSE)作为一门专注于设计、开发和优化各种应用的量子软件的学科应运而生。本文旨在回顾量子计算在软件工程研究中的作用以及量子软件工程的最新发展。据我们所知,这是第一篇关于该主题的全面综述。我们首先介绍了量子计算,探讨了其基本概念,并讨论了其在软件工程中的潜在应用。我们还探讨了可加快软件开发的各种 QSE 技术。最后,我们讨论了量子驱动的软件工程和 QSE 所面临的机遇和挑战。我们的研究揭示了量子机器学习和量子优化在解决经典软件工程任务方面的巨大潜力,尽管这一领域仍然有限。当前的 QSE 工具和技术缺乏鲁棒性和成熟性,这表明需要更多关注。主要挑战之一是量子计算尚未充分发挥其潜力。
{"title":"Quantum software engineering and potential of quantum computing in software engineering research: a review","authors":"Ashis Kumar Mandal,&nbsp;Md Nadim,&nbsp;Chanchal K. Roy,&nbsp;Banani Roy,&nbsp;Kevin A. Schneider","doi":"10.1007/s10515-025-00493-w","DOIUrl":"10.1007/s10515-025-00493-w","url":null,"abstract":"<div><p>Research in software engineering is essential for improving software development practices, leading to reliable and secure software. Leveraging the principles of quantum physics, quantum computing has emerged as a new computational paradigm that offers significant advantages over classical computing. As quantum computing progresses rapidly, its potential applications across various fields are becoming apparent. In software engineering, many tasks involve complex computations where quantum computers can greatly speed up the development process, leading to faster and more efficient solutions. With the growing use of quantum-based applications in different fields, Quantum Software Engineering (QSE) has emerged as a discipline focused on designing, developing, and optimizing quantum software for diverse applications. This paper aims to review the role of quantum computing in software engineering research and the latest developments in QSE. To our knowledge, this is the first comprehensive review on this topic. We begin by introducing quantum computing, exploring its fundamental concepts, and discussing its potential applications in software engineering. We also examine various QSE techniques that expedite software development. Finally, we discuss the opportunities and challenges in quantum-driven software engineering and QSE. Our study reveals that quantum machine learning and quantum optimization have substantial potential to address classical software engineering tasks, though this area is still limited. Current QSE tools and techniques lack robustness and maturity, indicating a need for more focus. One of the main challenges is that quantum computing has yet to reach its full potential.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143529919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the potential of general purpose LLMs in automated software refactoring: an empirical study 探索通用法学硕士在自动化软件重构中的潜力:一项实证研究
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-03-01 DOI: 10.1007/s10515-025-00500-0
Bo Liu, Yanjie Jiang, Yuxia Zhang, Nan Niu, Guangjie Li, Hui Liu

Software refactoring is an essential activity for improving the readability, maintainability, and reusability of software projects. To this end, a large number of automated or semi-automated approaches/tools have been proposed to locate poorly designed code, recommend refactoring solutions, and conduct specified refactorings. However, even equipped with such tools, it remains challenging for developers to decide where and what kind of refactorings should be applied. Recent advances in deep learning techniques, especially in large language models (LLMs), make it potentially feasible to automatically refactor source code with LLMs. However, it remains unclear how well LLMs perform compared to human experts in conducting refactorings automatically and accurately. To fill this gap, in this paper, we conduct an empirical study to investigate the potential of LLMs in automated software refactoring, focusing on the identification of refactoring opportunities and the recommendation of refactoring solutions. We first construct a high-quality refactoring dataset comprising 180 real-world refactorings from 20 projects, and conduct the empirical study on the dataset. With the to-be-refactored Java documents as input, ChatGPT and Gemini identified only 28 and 7 respectively out of the 180 refactoring opportunities. The evaluation results suggested that the performance of LLMs in identifying refactoring opportunities is generally low and remains an open problem. However, explaining the expected refactoring subcategories and narrowing the search space in the prompts substantially increased the success rate of ChatGPT from 15.6 to 86.7%. Concerning the recommendation of refactoring solutions, ChatGPT recommended 176 refactoring solutions for the 180 refactorings, and 63.6% of the recommended solutions were comparable to (even better than) those constructed by human experts. However, 13 out of the 176 solutions suggested by ChatGPT and 9 out of the 137 solutions suggested by Gemini were unsafe in that they either changed the functionality of the source code or introduced syntax errors, which indicate the risk of LLM-based refactoring.

软件重构是提高软件项目的可读性、可维护性和可重用性的基本活动。为此,已经提出了大量自动化或半自动化的方法/工具来定位设计不良的代码,推荐重构解决方案,并执行指定的重构。然而,即使配备了这样的工具,对于开发人员来说,决定应该在哪里以及应用什么样的重构仍然是一个挑战。深度学习技术的最新进展,特别是在大型语言模型(llm)中,使得使用llm自动重构源代码成为可能。然而,与人类专家相比,llm在自动准确地进行重构方面的表现如何尚不清楚。为了填补这一空白,在本文中,我们进行了一项实证研究,以调查llm在自动化软件重构中的潜力,重点关注重构机会的识别和重构解决方案的推荐。我们首先构建了一个高质量的重构数据集,包括来自20个项目的180个真实重构,并对该数据集进行了实证研究。以待重构的Java文档为输入,ChatGPT和Gemini在180个重构机会中分别只识别了28个和7个。评估结果表明,法学硕士在识别重构机会方面的表现普遍较低,仍然是一个有待解决的问题。然而,在提示中解释预期的重构子类别并缩小搜索空间,大大提高了ChatGPT的成功率,从15.6%提高到86.7%。在重构方案的推荐方面,ChatGPT为180次重构推荐了176个重构方案,其中63.6%的推荐方案与人类专家构建的方案相当(甚至更好)。然而,ChatGPT建议的176个解决方案中的13个和Gemini建议的137个解决方案中的9个是不安全的,因为它们要么改变了源代码的功能,要么引入了语法错误,这表明了基于llm的重构的风险。
{"title":"Exploring the potential of general purpose LLMs in automated software refactoring: an empirical study","authors":"Bo Liu,&nbsp;Yanjie Jiang,&nbsp;Yuxia Zhang,&nbsp;Nan Niu,&nbsp;Guangjie Li,&nbsp;Hui Liu","doi":"10.1007/s10515-025-00500-0","DOIUrl":"10.1007/s10515-025-00500-0","url":null,"abstract":"<div><p>Software refactoring is an essential activity for improving the readability, maintainability, and reusability of software projects. To this end, a large number of automated or semi-automated approaches/tools have been proposed to locate poorly designed code, recommend refactoring solutions, and conduct specified refactorings. However, even equipped with such tools, it remains challenging for developers to decide where and what kind of refactorings should be applied. Recent advances in deep learning techniques, especially in large language models (LLMs), make it potentially feasible to automatically refactor source code with LLMs. However, it remains unclear how well LLMs perform compared to human experts in conducting refactorings automatically and accurately. To fill this gap, in this paper, we conduct an empirical study to investigate the potential of LLMs in automated software refactoring, focusing on the identification of refactoring opportunities and the recommendation of refactoring solutions. We first construct a high-quality refactoring dataset comprising 180 real-world refactorings from 20 projects, and conduct the empirical study on the dataset. With the to-be-refactored Java documents as input, ChatGPT and Gemini identified only 28 and 7 respectively out of the 180 refactoring opportunities. The evaluation results suggested that the performance of LLMs in identifying refactoring opportunities is generally low and remains an open problem. However, explaining the expected refactoring subcategories and narrowing the search space in the prompts substantially increased the success rate of ChatGPT from 15.6 to 86.7%. Concerning the recommendation of refactoring solutions, ChatGPT recommended 176 refactoring solutions for the 180 refactorings, and 63.6% of the recommended solutions were comparable to (even better than) those constructed by human experts. However, 13 out of the 176 solutions suggested by ChatGPT and 9 out of the 137 solutions suggested by Gemini were unsafe in that they either changed the functionality of the source code or introduced syntax errors, which indicate the risk of LLM-based refactoring.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143527611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CodeDoctor: multi-category code review comment generation CodeDoctor:多类别代码评审注释生成
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-02-27 DOI: 10.1007/s10515-025-00491-y
Yingling Li, Yuhan Wu, Zi’ao Wang, Lei Huang, Junjie Wang, Jianping Li, Minying Huang

Code review is an effective software quality assurance activity. However, this process is labor-intensive and time-consuming, requiring reviewers to carefully review under various categories (e.g., function, refactoring, documentation, etc) to generate review comments. Several approaches have been proposed for automatic review comment generation, although they can generate review comments, they hardly cover all manual review comments. Because most of these approaches simply utilize the information of submitted code and review comments, not fully modeling the features of code review (i.e., ignoring review category, the association of issue snippets and review comments). In this paper, we propose CodeDoctor, an automatic review comment generator with data augmentation and category-aware encoder-decoder to generate multi-category review comments. It consists of three main phases: (1) Data augmentation phase, which classifies review comments and builds review exemplars (i.e., the pairs of issue snippet and its comment) to augment review data by using a large language model (LLM) with prompt engineering and feedback loops; (2) Encoder phase, which encodes the inputs (i.e., review category, diff code and review exemplar) into semantic and token representations; (3) Decoder phase, which designs a category-focused decoder to capture the most relevant information of given category for multi-category review comment generation. Evaluations with five commonly-used and state-of-the-art baselines on two datasets show that CodeDoctor outperforms all baselines, with 1770% higher average BLEU-4, 111% higher average ROUGE-L and 49% higher average F1 than the best baseline. Furthermore, a human evaluation also confirms the significant potential of applying CodeDoctor in practical usage. Our approach can relieve the burden of reviewers by automatically generating multi-category review comments, and helps developers better detect code issues as early as possible, thereby facilitating software development.

代码评审是一项有效的软件质量保证活动。然而,这个过程是劳动密集型和耗时的,需要审阅者在不同的类别下仔细地审阅(例如,功能、重构、文档等),以生成审阅意见。已经提出了几种自动评审评论生成的方法,尽管它们可以生成评审评论,但它们很难覆盖所有的人工评审评论。因为这些方法中的大多数只是简单地利用提交的代码和评审意见的信息,而没有完全建模代码评审的特征(例如,忽略了评审类别、问题片段和评审意见的关联)。在本文中,我们提出了CodeDoctor,一个具有数据增强和类别感知编码器-解码器的自动评审意见生成器,以生成多类别评审意见。它包括三个主要阶段:(1)数据增强阶段,该阶段对评审意见进行分类并构建评审范例(即,问题片段及其评论对),通过使用具有快速工程和反馈循环的大型语言模型(LLM)来增强评审数据;(2)编码器阶段,将输入(即评审类别、diff代码和评审范例)编码为语义和标记表示;(3)解码器阶段,设计以类别为中心的解码器,捕获给定类别最相关的信息,用于生成多类别评论。在两个数据集上使用五个常用和最先进的基线进行评估表明,CodeDoctor优于所有基线,比最佳基线高1770%,比平均ROUGE-L高111%,比平均F1高49%。此外,人类评估也证实了在实际使用中应用CodeDoctor的巨大潜力。我们的方法可以通过自动生成多类别评审意见来减轻评审人员的负担,并帮助开发人员更好地尽早发现代码问题,从而促进软件开发。
{"title":"CodeDoctor: multi-category code review comment generation","authors":"Yingling Li,&nbsp;Yuhan Wu,&nbsp;Zi’ao Wang,&nbsp;Lei Huang,&nbsp;Junjie Wang,&nbsp;Jianping Li,&nbsp;Minying Huang","doi":"10.1007/s10515-025-00491-y","DOIUrl":"10.1007/s10515-025-00491-y","url":null,"abstract":"<div><p>Code review is an effective software quality assurance activity. However, this process is labor-intensive and time-consuming, requiring reviewers to carefully review under various categories (e.g., function, refactoring, documentation, etc) to generate review comments. Several approaches have been proposed for automatic review comment generation, although they can generate review comments, they hardly cover all manual review comments. Because most of these approaches simply utilize the information of submitted code and review comments, not fully modeling the features of code review (i.e., ignoring review category, the association of issue snippets and review comments). In this paper, we propose CodeDoctor, an automatic review comment generator with data augmentation and category-aware encoder-decoder to generate multi-category review comments. It consists of three main phases: (1) Data augmentation phase, which classifies review comments and builds review exemplars (i.e., the pairs of issue snippet and its comment) to augment review data by using a large language model (LLM) with prompt engineering and feedback loops; (2) Encoder phase, which encodes the inputs (i.e., review category, diff code and review exemplar) into semantic and token representations; (3) Decoder phase, which designs a category-focused decoder to capture the most relevant information of given category for multi-category review comment generation. Evaluations with five commonly-used and state-of-the-art baselines on two datasets show that CodeDoctor outperforms all baselines, with 1770% higher average BLEU-4, 111% higher average ROUGE-L and 49% higher average F1 than the best baseline. Furthermore, a human evaluation also confirms the significant potential of applying CodeDoctor in practical usage. Our approach can relieve the burden of reviewers by automatically generating multi-category review comments, and helps developers better detect code issues as early as possible, thereby facilitating software development.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143513226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bmco-o: a smart code smell detection method based on co-occurrences Bmco-o:基于共现的智能代码气味检测方法
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-02-21 DOI: 10.1007/s10515-025-00486-9
Feiqiao Mao, Kaihang Zhong, Long Cheng

Code smell detection is a task aimed at identifying sub-optimal programming structures within code entities that may indicate problems requiring attention. It plays a crucial role in improving software quality. Numerous automatic or semi-automatic methods for code smell detection have been proposed. However, these methods are constrained by the manual setting of detection rules and thresholds, leading to subjective determinations, or they require large-scale labeled datasets for model training. In addition, they exhibit poor detection performance across different projects. Related studies have revealed the existence of co-occurrences among different types of code smells. Therefore, we propose a smart code smell detection method based on code smell co-occurrences, termed BMCo-O. The key insight is that code smell co-occurrences can assist in improving code smell detection. We introduce and utilize code smell co-occurrence impact factor set, a code smell pre-filter mechanism, and a possibility mechanism, which enable BMCo-O to demonstrate outstanding detection performance. To reduce manual intervention, we propose an adaptive detection mechanism that automatically adjusts parameters to detect different types of code smell in various software projects. As an initial attempt, we applied the proposed method to seven classical high-criticality code smells: Message Chain, Feature Envy, Spaghetti Code, Large Class, Complex Class, Refused Bequest, and Long Method. The evaluation results on benchmarks composed of open source software projects demonstrated that BMCo-O significantly outperforms the well-known and widely used methods in detecting these seven classical code smells, especially in F1, with improvements of 137%, 155%, 23%, 195%, 364%, 552% and 35%, respectively. To further verify its effectiveness in actual detection across different software projects, we also implemented a prototype of a new code smell detector using BMCo-O.

代码气味检测是一项旨在识别代码实体中可能指示需要注意的问题的次优编程结构的任务。它在提高软件质量方面起着至关重要的作用。人们提出了许多自动或半自动的代码气味检测方法。然而,这些方法受到人工设置检测规则和阈值的限制,导致主观判断,或者需要大规模标记数据集进行模型训练。此外,它们在不同的项目中表现出较差的检测性能。相关研究表明,不同类型的代码气味之间存在共现现象。因此,我们提出了一种基于代码气味共现的智能代码气味检测方法,称为BMCo-O。关键的观点是代码气味的共同出现可以帮助改进代码气味检测。我们引入并利用代码气味共现影响因子集、代码气味预过滤机制和可能性机制,使BMCo-O表现出出色的检测性能。为了减少人工干预,我们提出了一种自适应检测机制,该机制可以自动调整参数以检测各种软件项目中不同类型的代码气味。作为初步尝试,我们将提出的方法应用于七种经典的高临界代码气味:消息链、特征嫉妒、意大利面条代码、大类、复杂类、拒绝继承和长方法。在由开源软件项目组成的基准测试中,评估结果表明,BMCo-O在检测这七种经典代码气味方面明显优于知名和广泛使用的方法,特别是在F1方面,分别提高了137%、155%、23%、195%、364%、552%和35%。为了进一步验证它在跨不同软件项目的实际检测中的有效性,我们还使用BMCo-O实现了一个新的代码气味检测器的原型。
{"title":"Bmco-o: a smart code smell detection method based on co-occurrences","authors":"Feiqiao Mao,&nbsp;Kaihang Zhong,&nbsp;Long Cheng","doi":"10.1007/s10515-025-00486-9","DOIUrl":"10.1007/s10515-025-00486-9","url":null,"abstract":"<div><p>Code smell detection is a task aimed at identifying sub-optimal programming structures within code entities that may indicate problems requiring attention. It plays a crucial role in improving software quality. Numerous automatic or semi-automatic methods for code smell detection have been proposed. However, these methods are constrained by the manual setting of detection rules and thresholds, leading to subjective determinations, or they require large-scale labeled datasets for model training. In addition, they exhibit poor detection performance across different projects. Related studies have revealed the existence of co-occurrences among different types of code smells. Therefore, we propose a smart code smell detection method based on code smell co-occurrences, termed BMCo-O. The key insight is that code smell co-occurrences can assist in improving code smell detection. We introduce and utilize <i>code smell co-occurrence impact factor set</i>, a <i> code smell pre-filter mechanism</i>, and a <i>possibility mechanism</i>, which enable BMCo-O to demonstrate outstanding detection performance. To reduce manual intervention, we propose an <i>adaptive detection mechanism</i> that automatically adjusts parameters to detect different types of code smell in various software projects. As an initial attempt, we applied the proposed method to seven classical high-criticality code smells: Message Chain, Feature Envy, Spaghetti Code, Large Class, Complex Class, Refused Bequest, and Long Method. The evaluation results on benchmarks composed of open source software projects demonstrated that BMCo-O significantly outperforms the well-known and widely used methods in detecting these seven classical code smells, especially in F1, with improvements of 137%, 155%, 23%, 195%, 364%, 552% and 35%, respectively. To further verify its effectiveness in actual detection across different software projects, we also implemented a prototype of a new code smell detector using BMCo-O.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling functional aspects in google play education app titles and descriptions influencing app success 揭示b谷歌游戏教育应用标题和描述中影响应用成功的功能因素
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-02-21 DOI: 10.1007/s10515-025-00497-6
Ahmad Bilal, Hamid Turab Mirza, Adnan Ahmad, Ibrar Hussain, Ahmad Salman Khan

Users search for applications on the online application store by inputting functional terms, such as “automated assignment solver”, “English translator” and “free VPN”. In response, the application store recommends a list of applications whose titles and descriptions closely match the user’s search terms. Acknowledging this, application developers incorporate trending and frequently searched functional terms into their application titles and descriptions to make them compelling and to enhance the visibility of their products in user searches, thereby increasing the likelihood of application success. However, traditional literature analyzing mobile application titles and descriptions to determine their impact on application success is scarce and may also lack data-analytical approaches. Moreover, the definition of application success provided by existing literature may be flawed, as it solely relies on higher downloads or positive numeric ratings, neglecting the crucial factor of time. This research proposes a Machine Learning-inspired framework to extract functional (aspects) themes from titles and descriptions of Google Play Education applications, influencing their success. It also formulates an enhanced definition of application success that considers downloads and ratings over a specific time period, and also integrates the user sentiment when defining application success. According to the findings of this research, themes of Math and Homework Support, Learning and Practice, Live Assistance and Tutoring, and Instant Solutions and Tools are highly correlated with success within the Education category of the Google Play store. Developers can enhance the visibility and appeal of their applications in user search results by incorporating these themes into their application titles and descriptions, ultimately leading to higher likelihood of success.

用户通过输入“自动分配解决器”、“英语翻译”和“免费VPN”等功能术语,在在线应用程序商店搜索应用程序。作为回应,应用程序商店会推荐一个应用程序列表,这些应用程序的标题和描述与用户的搜索条件非常匹配。认识到这一点,应用程序开发人员将趋势和经常搜索的功能术语合并到他们的应用程序标题和描述中,以使它们引人注目,并提高他们的产品在用户搜索中的可见性,从而增加应用程序成功的可能性。然而,分析手机应用标题和描述以确定其对应用成功的影响的传统文献很少,也可能缺乏数据分析方法。此外,现有文献提供的应用成功定义可能存在缺陷,因为它仅仅依赖于更高的下载量或积极的数字评级,而忽略了时间这一关键因素。本研究提出了一个受机器学习启发的框架,从b谷歌Play Education应用程序的标题和描述中提取功能(方面)主题,影响它们的成功。它还制定了应用成功的增强定义,考虑了特定时间段内的下载和评级,并在定义应用成功时整合了用户情绪。根据这项研究的结果,数学和家庭作业支持、学习和实践、现场援助和辅导、即时解决方案和工具的主题与谷歌Play商店教育类别的成功高度相关。开发者可以通过将这些主题整合到应用标题和描述中,从而提高应用在用户搜索结果中的可见性和吸引力,最终提高成功的可能性。
{"title":"Unveiling functional aspects in google play education app titles and descriptions influencing app success","authors":"Ahmad Bilal,&nbsp;Hamid Turab Mirza,&nbsp;Adnan Ahmad,&nbsp;Ibrar Hussain,&nbsp;Ahmad Salman Khan","doi":"10.1007/s10515-025-00497-6","DOIUrl":"10.1007/s10515-025-00497-6","url":null,"abstract":"<div><p>Users search for applications on the online application store by inputting functional terms, such as “automated assignment solver”, “English translator” and “free VPN”. In response, the application store recommends a list of applications whose titles and descriptions closely match the user’s search terms. Acknowledging this, application developers incorporate trending and frequently searched functional terms into their application titles and descriptions to make them compelling and to enhance the visibility of their products in user searches, thereby increasing the likelihood of application success. However, traditional literature analyzing mobile application titles and descriptions to determine their impact on application success is scarce and may also lack data-analytical approaches. Moreover, the definition of application success provided by existing literature may be flawed, as it solely relies on higher downloads or positive numeric ratings, neglecting the crucial factor of time. This research proposes a Machine Learning-inspired framework to extract functional (aspects) themes from titles and descriptions of Google Play Education applications, influencing their success. It also formulates an enhanced definition of application success that considers downloads and ratings over a specific time period, and also integrates the user sentiment when defining application success. According to the findings of this research, themes of Math and Homework Support, Learning and Practice, Live Assistance and Tutoring, and Instant Solutions and Tools are highly correlated with success within the Education category of the Google Play store. Developers can enhance the visibility and appeal of their applications in user search results by incorporating these themes into their application titles and descriptions, ultimately leading to higher likelihood of success.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Automated Software Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1