ACM Transactions on Software Engineering and Methodology最新文献_第3页

Automatic Repair of Quantum Programs via Unitary Operation 通过单元操作自动修复量子程序

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-05-11 DOI: 10.1145/3664604

Yuechen Li, Hanyu Pei, Linzhi Huang, Beibei Yin, Kai-Yuan Cai

With the continuous advancement of quantum computing (QC), the demand for high-quality quantum programs (QPs) is growing. In order to avoid program failure, in software engineering, the technology of automatic program repair (APR) employs appropriate patches to remove potential bugs without the intervention of a human. However, the method tailored for repairing defective QPs is still absent. This paper proposes a new APR method named (texttt {UnitAR} ) that can repair QPs via unitary operation automatically. Based on the characteristics of superposition and entanglement in QC, the paper constructs an algebraic model and adopts a generate-and-validate approach for the repair procedure. Furthermore, the paper presents two schemes that can respectively promote the efficiency of generating patches and guarantee the effectiveness of applying patches. For the purpose of evaluating the proposed method, the paper selects 29 mutated versions as well as 5 real-world buggy programs as the objects, and introduces two traditional APR approaches (texttt {GenProg} ) and (texttt {TBar} ) as baselines. According to the experiments, (texttt {UnitAR} ) can fix 23 buggy programs, and this method demonstrates the highest efficiency and effectiveness among 3 APR approaches. Besides, the experimental results further manifest the crucial roles of two constituents involved in the framework of (texttt {UnitAR} ).

随着量子计算（QC）的不断发展，对高质量量子程序（QPs）的需求也在不断增长。为了避免程序失效，在软件工程中，程序自动修复（APR）技术采用适当的补丁来清除潜在的错误，而无需人工干预。然而，为修复有缺陷的 QPs 量身定制的方法仍然缺乏。本文提出了一种新的 APR 方法，即通过单元运算自动修复 QPs 的方法（texttt {UnitAR} ）。根据 QC 中叠加和纠缠的特点，本文构建了一个代数模型，并采用生成-验证的方法来实现修复过程。此外，本文还提出了两种方案，分别能提高生成补丁的效率和保证应用补丁的有效性。为了评估所提出的方法，本文选取了29个突变版本以及5个现实世界中的漏洞程序作为对象，并引入了两种传统的APR方法（texttt {GenProg} ）和（texttt {TBar} ）作为基线。实验结果表明，（texttt {UnitAR}）可以修复23个错误程序，是3种APR方法中效率和效果最高的。此外，实验结果还进一步证明了（texttt {UnitAR} ）框架中涉及的两种成分的关键作用。

{"title":"Automatic Repair of Quantum Programs via Unitary Operation","authors":"Yuechen Li, Hanyu Pei, Linzhi Huang, Beibei Yin, Kai-Yuan Cai","doi":"10.1145/3664604","DOIUrl":"https://doi.org/10.1145/3664604","url":null,"abstract":"With the continuous advancement of quantum computing (QC), the demand for high-quality quantum programs (QPs) is growing. In order to avoid program failure, in software engineering, the technology of automatic program repair (APR) employs appropriate patches to remove potential bugs without the intervention of a human. However, the method tailored for repairing defective QPs is still absent. This paper proposes a new APR method named (texttt {UnitAR} ) that can repair QPs via unitary operation automatically. Based on the characteristics of superposition and entanglement in QC, the paper constructs an algebraic model and adopts a generate-and-validate approach for the repair procedure. Furthermore, the paper presents two schemes that can respectively promote the efficiency of generating patches and guarantee the effectiveness of applying patches. For the purpose of evaluating the proposed method, the paper selects 29 mutated versions as well as 5 real-world buggy programs as the objects, and introduces two traditional APR approaches (texttt {GenProg} ) and (texttt {TBar} ) as baselines. According to the experiments, (texttt {UnitAR} ) can fix 23 buggy programs, and this method demonstrates the highest efficiency and effectiveness among 3 APR approaches. Besides, the experimental results further manifest the crucial roles of two constituents involved in the framework of (texttt {UnitAR} ).","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"26 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Supporting Emotional Intelligence, Productivity and Team Goals while Handling Software Requirements Changes 在处理软件需求变更时支持情商、生产力和团队目标

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-05-11 DOI: 10.1145/3664600

Kashumi Madampe, Rashina Hoda, John Grundy

Background:Research shows that emotional intelligence (EI) should be used alongside cognitive intelligence during requirements change (RC) handling in Software Engineering (SE), especially in agile settings. Objective: We wanted to study the role of EI in-depth during RC handling. Method:We conducted a mixed-methods study (an interview study followed by a survey study) with 124 software practitioners. Findings:We found the causal condition, intervening condition and causes lead to key direct consequences of regulating own emotions, managing relationships, and extended consequences of sustaining productivity, setting and sustaining team goals. We found several strategies of supporting EI during RC handling. Further, we found strong correlations between six strategies and one being aware of own emotions, regulating own emotions, sustaining team productivity, and setting and sustaining team goals. Conclusion:Empathising with others and tracking commitments and decisions as a team are key strategies that have strong correlations between managing emotions, between sustaining team productivity, and between setting and sustaining team goals. To the best of our knowledge, the framework we present in this paper is the first theoretical framework on EI in SE research. We provide recommendations for software practitioners to consider during RC handling.

背景：研究表明，在软件工程（SE）的需求变更（RC）处理过程中，情绪智力（EI）应与认知智力并用，尤其是在敏捷环境中。研究目的我们希望深入研究情商在 RC 处理过程中的作用。方法：我们对 124 名软件从业人员进行了一项混合方法研究（访谈研究和调查研究）。结果：我们发现，因果条件、干预条件和原因导致了调节自身情绪、处理人际关系等关键的直接后果，以及维持生产率、设定和维持团队目标等扩展后果。我们发现了在处理 RC 时支持 EI 的几种策略。此外，我们还发现六种策略与意识到自己的情绪、调节自己的情绪、维持团队工作效率以及制定和维持团队目标之间存在着很强的相关性。结论：与他人共情以及作为一个团队跟踪承诺和决策是关键策略，它们与管理情绪、维持团队生产率以及设定和维持团队目标之间有很强的相关性。据我们所知，我们在本文中提出的框架是 SE 研究中第一个关于 EI 的理论框架。我们为软件从业人员提供了在处理 RC 时应考虑的建议。

{"title":"Supporting Emotional Intelligence, Productivity and Team Goals while Handling Software Requirements Changes","authors":"Kashumi Madampe, Rashina Hoda, John Grundy","doi":"10.1145/3664600","DOIUrl":"https://doi.org/10.1145/3664600","url":null,"abstract":"Background:\u0000Research shows that emotional intelligence (EI) should be used alongside cognitive intelligence during requirements change (RC) handling in Software Engineering (SE), especially in agile settings. Objective: We wanted to study the role of EI in-depth during RC handling. Method:\u0000We conducted a mixed-methods study (an interview study followed by a survey study) with 124 software practitioners. Findings:\u0000We found the causal condition, intervening condition and causes lead to key direct consequences of regulating own emotions, managing relationships, and extended consequences of sustaining productivity, setting and sustaining team goals. We found several strategies of supporting EI during RC handling. Further, we found strong correlations between six strategies and one being aware of own emotions, regulating own emotions, sustaining team productivity, and setting and sustaining team goals. Conclusion:\u0000Empathising with others and tracking commitments and decisions as a team are key strategies that have strong correlations between managing emotions, between sustaining team productivity, and between setting and sustaining team goals. To the best of our knowledge, the framework we present in this paper is the first theoretical framework on EI in SE research. We provide recommendations for software practitioners to consider during RC handling.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"11 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Domain Adaptation With Max-Margin Principle for Cross-Project Imbalanced Software Vulnerability Detection 利用最大边际原则进行跨项目不平衡软件漏洞检测的深度域自适应技术

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-05-09 DOI: 10.1145/3664602

Van Nguyen, Trung Le, Chakkrit Tantithamthavorn, John Grundy, Dinh Phung

Software vulnerabilities (SVs) have become a common, serious, and crucial concern due to the ubiquity of computer software. Many AI-based approaches have been proposed to solve the software vulnerability detection (SVD) problem to ensure the security and integrity of software applications (in both the development and testing phases). However, there are still two open and significant issues for SVD in terms of i) learning automatic representations to improve the predictive performance of SVD, and ii) tackling the scarcity of labeled vulnerability datasets that conventionally need laborious labeling effort by experts. In this paper, we propose a novel approach to tackle these two crucial issues. We first exploit the automatic representation learning with deep domain adaptation for SVD. We then propose a novel cross-domain kernel classifier leveraging the max-margin principle to significantly improve the transfer learning process of SVs from imbalanced labeled into imbalanced unlabeled projects. Our approach is the first work that leverages solid body theories of the max-margin principle, kernel methods, and bridging the gap between source and target domains for imbalanced domain adaptation (DA) applied in cross-project SVD. The experimental results on real-world software datasets show the superiority of our proposed method over state-of-the-art baselines. In short, our method obtains a higher performance on F1-measure, one of the most important measures in SVD, from 1.83% to 6.25% compared to the second highest method in the used datasets.

由于计算机软件无处不在，软件漏洞（SVs）已成为一个普遍、严重和至关重要的问题。人们提出了许多基于人工智能的方法来解决软件漏洞检测（SVD）问题，以确保软件应用程序（在开发和测试阶段）的安全性和完整性。然而，软件漏洞检测仍有两个重要问题有待解决：一是学习自动表征以提高软件漏洞检测的预测性能；二是解决标注漏洞数据集稀缺的问题，传统上需要专家费力地进行标注。在本文中，我们提出了一种新方法来解决这两个关键问题。我们首先利用 SVD 的深度域适应自动表示学习。然后，我们提出了一种利用最大边际原则的新型跨域内核分类器，以显著改善 SV 从不平衡性标注项目到不平衡性非标注项目的迁移学习过程。我们的方法是利用最大边际原理、内核方法以及缩小源域和目标域之间的差距等坚实理论来实现跨项目 SVD 中不平衡域适应（DA）的第一项工作。在实际软件数据集上的实验结果表明，我们提出的方法优于最先进的基线方法。简而言之，与所使用数据集中排名第二的方法相比，我们的方法在 F1 测量（SVD 中最重要的测量指标之一）上获得了更高的性能，从 1.83% 提高到 6.25%。

{"title":"Deep Domain Adaptation With Max-Margin Principle for Cross-Project Imbalanced Software Vulnerability Detection","authors":"Van Nguyen, Trung Le, Chakkrit Tantithamthavorn, John Grundy, Dinh Phung","doi":"10.1145/3664602","DOIUrl":"https://doi.org/10.1145/3664602","url":null,"abstract":"Software vulnerabilities (SVs) have become a common, serious, and crucial concern due to the ubiquity of computer software. Many AI-based approaches have been proposed to solve the software vulnerability detection (SVD) problem to ensure the security and integrity of software applications (in both the development and testing phases). However, there are still two open and significant issues for SVD in terms of i) learning automatic representations to improve the predictive performance of SVD, and ii) tackling the scarcity of labeled vulnerability datasets that conventionally need laborious labeling effort by experts. In this paper, we propose a novel approach to tackle these two crucial issues. We first exploit the automatic representation learning with deep domain adaptation for SVD. We then propose a novel cross-domain kernel classifier leveraging the max-margin principle to significantly improve the transfer learning process of SVs from imbalanced labeled into imbalanced unlabeled projects. Our approach is the first work that leverages solid body theories of the max-margin principle, kernel methods, and bridging the gap between source and target domains for imbalanced domain adaptation (DA) applied in cross-project SVD. The experimental results on real-world software datasets show the superiority of our proposed method over state-of-the-art baselines. In short, our method obtains a higher performance on F1-measure, one of the most important measures in SVD, from 1.83% to 6.25% compared to the second highest method in the used datasets.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"69 3 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fairness Testing of Machine Translation Systems 机器翻译系统的公平性测试

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-05-09 DOI: 10.1145/3664608

Zeyu Sun, Zhenpeng Chen, Jie Zhang, Dan Hao

Machine translation is integral to international communication and extensively employed in diverse human-related applications. Despite remarkable progress, fairness issues persist within current machine translation systems. In this paper, we propose FairMT, an automated fairness testing approach tailored for machine translation systems. FairMT operates on the assumption that translations of semantically similar sentences, containing protected attributes from distinct demographic groups, should maintain comparable meanings. It comprises three key steps: (1) test input generation, producing inputs covering various demographic groups; (2) test oracle generation, identifying potential unfair translations based on semantic similarity measurements; and (3) regression, discerning genuine fairness issues from those caused by low-quality translation. Leveraging FairMT, we conduct an empirical study on three leading machine translation systems—Google Translate, T5, and Transformer. Our investigation uncovers up to 832, 1,984, and 2,627 unfair translations across the three systems, respectively. Intriguingly, we observe that fair translations tend to exhibit superior translation performance, challenging the conventional wisdom of a fairness-performance trade-off prevalent in the fairness literature.

机器翻译是国际交流不可或缺的一部分，并广泛应用于各种与人类相关的应用领域。尽管取得了巨大进步，但目前的机器翻译系统仍然存在公平性问题。在本文中，我们提出了为机器翻译系统量身定制的自动公平性测试方法 FairMT。FairMT 的运行假设是，语义相似的句子，包含来自不同人口群体的受保护属性，其翻译应保持可比的含义。它包括三个关键步骤：(1) 生成测试输入，生成涵盖不同人口群体的输入；(2) 生成测试甲骨文，根据语义相似性测量结果识别潜在的不公平翻译；(3) 回归，从低质量翻译中分辨出真正的公平性问题。利用 FairMT，我们对谷歌翻译、T5 和 Transformer 这三个领先的机器翻译系统进行了实证研究。我们的调查在这三个系统中分别发现了多达 832、1984 和 2627 个不公平翻译。有趣的是，我们发现公平的翻译往往表现出更优越的翻译性能，这对公平性文献中盛行的公平性-性能权衡的传统观点提出了挑战。

{"title":"Fairness Testing of Machine Translation Systems","authors":"Zeyu Sun, Zhenpeng Chen, Jie Zhang, Dan Hao","doi":"10.1145/3664608","DOIUrl":"https://doi.org/10.1145/3664608","url":null,"abstract":"Machine translation is integral to international communication and extensively employed in diverse human-related applications. Despite remarkable progress, fairness issues persist within current machine translation systems. In this paper, we propose FairMT, an automated fairness testing approach tailored for machine translation systems. FairMT operates on the assumption that translations of semantically similar sentences, containing protected attributes from distinct demographic groups, should maintain comparable meanings. It comprises three key steps: (1) test input generation, producing inputs covering various demographic groups; (2) test oracle generation, identifying potential unfair translations based on semantic similarity measurements; and (3) regression, discerning genuine fairness issues from those caused by low-quality translation. Leveraging FairMT, we conduct an empirical study on three leading machine translation systems—Google Translate, T5, and Transformer. Our investigation uncovers up to 832, 1,984, and 2,627 unfair translations across the three systems, respectively. Intriguingly, we observe that fair translations tend to exhibit superior translation performance, challenging the conventional wisdom of a fairness-performance trade-off prevalent in the fairness literature.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"7 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unveiling Code Pre-Trained Models: Investigating Syntax and Semantics Capacities 揭开代码预训练模型的神秘面纱：调查语法和语义能力

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-05-09 DOI: 10.1145/3664606

Wei Ma, Shangqing Liu, Mengjie Zhao, Xiaofei Xie, Wenhang Wang, Qiang Hu, Jie Zhang, Yang Liu

Code models have made significant advancements in code intelligence by encoding knowledge about programming languages. While previous studies have explored the capabilities of these models in learning code syntax, there has been limited investigation on their ability to understand code semantics. Additionally, existing analyses assume the number of edges between nodes at the abstract syntax tree (AST) is related to syntax distance, and also often require transforming the high-dimensional space of deep learning models to a low-dimensional one, which may introduce inaccuracies. To study how code models represent code syntax and semantics, we conduct a comprehensive analysis of 7 code models, including four representative code pre-trained models (CodeBERT, GraphCodeBERT, CodeT5, and UnixCoder) and three large language models (StarCoder, CodeLlama and CodeT5+). We design four probing tasks to assess the models’ capacities in learning both code syntax and semantics. These probing tasks reconstruct code syntax and semantics structures (AST, CDG, DDG and CFG) in the representation space. These structures are core concepts for code understanding. We also investigate the syntax token role in each token representation and the long dependency between the code tokens. Additionally, we analyze the distribution of attention weights related to code semantic structures. Through extensive analysis, our findings highlight the strengths and limitations of different code models in learning code syntax and semantics. The results demonstrate that these models excel in learning code syntax, successfully capturing the syntax relationships between tokens and the syntax roles of individual tokens. However, their performance in encoding code semantics varies. CodeT5 and CodeBERT demonstrate proficiency in capturing control and data dependencies, while UnixCoder shows weaker performance in this aspect. We do not observe LLMs generally performing much better than pre-trained models. The shallow layers of LLMs perform better than their deep layers. The investigation of attention weights reveals that different attention heads play distinct roles in encoding code semantics. Our research findings emphasize the need for further enhancements in code models to better learn code semantics. This study contributes to the understanding of code models’ abilities in syntax and semantics analysis. Our findings provide guidance for future improvements in code models, facilitating their effective application in various code-related tasks.

代码模型通过编码编程语言知识，在代码智能方面取得了重大进展。虽然之前的研究已经探索了这些模型在学习代码语法方面的能力，但对其理解代码语义的能力的研究还很有限。此外，现有分析假定抽象语法树（AST）节点间的边数与语法距离有关，而且往往需要将深度学习模型的高维空间转换为低维空间，这可能会带来误差。为了研究代码模型如何表示代码语法和语义，我们对 7 个代码模型进行了全面分析，其中包括 4 个具有代表性的代码预训练模型（CodeBERT、GraphCodeBERT、CodeT5 和 UnixCoder）和 3 个大型语言模型（StarCoder、CodeLlama 和 CodeT5+）。我们设计了四个探测任务来评估模型学习代码语法和语义的能力。这些探测任务在表示空间中重建代码语法和语义结构（AST、CDG、DDG 和 CFG）。这些结构是代码理解的核心概念。我们还研究了语法标记在每个标记表征中的作用以及代码标记之间的长期依赖关系。此外，我们还分析了与代码语义结构相关的注意力权重分布。通过广泛的分析，我们的研究结果突出了不同代码模型在学习代码语法和语义方面的优势和局限性。结果表明，这些模型在学习代码语法方面表现出色，能够成功捕捉到标记之间的语法关系以及单个标记的语法作用。然而，它们在编码代码语义方面的表现却各不相同。CodeT5 和 CodeBERT 在捕捉控制和数据依赖关系方面表现出色，而 UnixCoder 在这方面表现较弱。我们没有发现 LLM 的性能普遍比预训练模型好得多。LLM 的浅层比深层表现更好。对注意力权重的研究表明，不同的注意力头在编码代码语义时发挥着不同的作用。我们的研究结果强调了进一步改进代码模型以更好地学习代码语义的必要性。这项研究有助于理解代码模型在语法和语义分析方面的能力。我们的研究结果为今后改进代码模型提供了指导，有助于它们在各种代码相关任务中的有效应用。

{"title":"Unveiling Code Pre-Trained Models: Investigating Syntax and Semantics Capacities","authors":"Wei Ma, Shangqing Liu, Mengjie Zhao, Xiaofei Xie, Wenhang Wang, Qiang Hu, Jie Zhang, Yang Liu","doi":"10.1145/3664606","DOIUrl":"https://doi.org/10.1145/3664606","url":null,"abstract":"Code models have made significant advancements in code intelligence by encoding knowledge about programming languages. While previous studies have explored the capabilities of these models in learning code syntax, there has been limited investigation on their ability to understand code semantics. Additionally, existing analyses assume the number of edges between nodes at the abstract syntax tree (AST) is related to syntax distance, and also often require transforming the high-dimensional space of deep learning models to a low-dimensional one, which may introduce inaccuracies. To study how code models represent code syntax and semantics, we conduct a comprehensive analysis of 7 code models, including four representative code pre-trained models (CodeBERT, GraphCodeBERT, CodeT5, and UnixCoder) and three large language models (StarCoder, CodeLlama and CodeT5+). We design four probing tasks to assess the models’ capacities in learning both code syntax and semantics. These probing tasks reconstruct code syntax and semantics structures (AST, CDG, DDG and CFG) in the representation space. These structures are core concepts for code understanding. We also investigate the syntax token role in each token representation and the long dependency between the code tokens. Additionally, we analyze the distribution of attention weights related to code semantic structures. Through extensive analysis, our findings highlight the strengths and limitations of different code models in learning code syntax and semantics. The results demonstrate that these models excel in learning code syntax, successfully capturing the syntax relationships between tokens and the syntax roles of individual tokens. However, their performance in encoding code semantics varies. CodeT5 and CodeBERT demonstrate proficiency in capturing control and data dependencies, while UnixCoder shows weaker performance in this aspect. We do not observe LLMs generally performing much better than pre-trained models. The shallow layers of LLMs perform better than their deep layers. The investigation of attention weights reveals that different attention heads play distinct roles in encoding code semantics. Our research findings emphasize the need for further enhancements in code models to better learn code semantics. This study contributes to the understanding of code models’ abilities in syntax and semantics analysis. Our findings provide guidance for future improvements in code models, facilitating their effective application in various code-related tasks.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"29 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Replication in Requirements Engineering: the NLP for RE Case 需求工程中的复制：NLP for RE 案例

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-04-15 DOI: 10.1145/3658669

Sallam Abualhaija, Fatma Başak Aydemir, Fabiano Dalpiaz, Davide Dell’Anna, Alessio Ferrari, Xavier Franch, Davide Fucci

[Context] Natural language processing (NLP) techniques have been widely applied in the requirements engineering (RE) field to support tasks such as classification and ambiguity detection. Despite its empirical vocation, RE research has given limited attention to replication of NLP for RE studies. Replication is hampered by several factors, including the context specificity of the studies, the heterogeneity of the tasks involving NLP, the tasks’ inherent hairiness, and, in turn, the heterogeneous reporting structure. [Solution] To address these issues, we propose a new artifact, referred to as ID-Card, whose goal is to provide a structured summary of research papers emphasizing replication-relevant information. We construct the ID-Card through a structured, iterative process based on design science. [Results] In this paper: (i) we report on hands-on experiences of replication, (ii) we review the state-of-the-art and extract replication-relevant information, (iii) we identify, through focus groups, challenges across two typical dimensions of replication: data annotation and tool reconstruction, and (iv) we present the concept and structure of the ID-Card to mitigate the identified challenges. [Contribution] This study aims to create awareness of replication in NLP for RE. We propose an ID-Card that is intended to foster study replication, but can also be used in other contexts, e.g., for educational purposes.

[背景] 自然语言处理（NLP）技术已广泛应用于需求工程（RE）领域，为分类和模糊性检测等任务提供支持。尽管需求工程研究以实证为己任，但它对在需求工程研究中复制 NLP 的关注却很有限。复制工作受到几个因素的阻碍，包括研究背景的特殊性、涉及 NLP 的任务的异质性、任务固有的毛糙性，以及反过来的异质性报告结构。[解决方案]为了解决这些问题，我们提出了一种新的工具，称为 ID-Card，其目标是提供研究论文的结构化摘要，强调与复制相关的信息。我们通过一个基于设计科学的结构化迭代过程来构建 ID-Card。[结果]在本文中(i)我们报告了复制的实践经验，(ii)我们回顾了最先进的技术并提取了与复制相关的信息，(iii)我们通过焦点小组确定了复制的两个典型方面的挑战：数据注释和工具重建，以及(iv)我们提出了 ID-Card 的概念和结构，以减轻已确定的挑战。[贡献]本研究旨在为 RE 提高对 NLP 复制的认识。我们提出了一种 ID 卡，旨在促进研究的复制，但也可用于其他场合，如教育目的。

{"title":"Replication in Requirements Engineering: the NLP for RE Case","authors":"Sallam Abualhaija, Fatma Başak Aydemir, Fabiano Dalpiaz, Davide Dell’Anna, Alessio Ferrari, Xavier Franch, Davide Fucci","doi":"10.1145/3658669","DOIUrl":"https://doi.org/10.1145/3658669","url":null,"abstract":"[Context] Natural language processing (NLP) techniques have been widely applied in the requirements engineering (RE) field to support tasks such as classification and ambiguity detection. Despite its empirical vocation, RE research has given limited attention to replication of NLP for RE studies. Replication is hampered by several factors, including the context specificity of the studies, the heterogeneity of the tasks involving NLP, the tasks’ inherent hairiness, and, in turn, the heterogeneous reporting structure. [Solution] To address these issues, we propose a new artifact, referred to as ID-Card, whose goal is to provide a structured summary of research papers emphasizing replication-relevant information. We construct the ID-Card through a structured, iterative process based on design science. [Results] In this paper: (i) we report on hands-on experiences of replication, (ii) we review the state-of-the-art and extract replication-relevant information, (iii) we identify, through focus groups, challenges across two typical dimensions of replication: data annotation and tool reconstruction, and (iv) we present the concept and structure of the ID-Card to mitigate the identified challenges. [Contribution] This study aims to create awareness of replication in NLP for RE. We propose an ID-Card that is intended to foster study replication, but can also be used in other contexts, e.g., for educational purposes.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"45 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BatFix: Repairing language model-based transpilation BatFix：修复基于语言模型的转译

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-04-12 DOI: 10.1145/3658668

Daniel Ramos, Inês Lynce, Vasco Manquinho, Ruben Martins, Claire Le Goues

To keep up with changes in requirements, frameworks, and coding practices, software organizations might need to migrate code from one language to another. Source-to-source migration, or transpilation, is often a complex, manual process. Transpilation requires expertise both in the source and target language, making it highly laborious and costly. Languages models for code generation and transpilation are becoming increasingly popular. However, despite capturing code-structure well, code generated by language models is often spurious and contains subtle problems. We propose BatFix, a novel approach that augments language models for transpilation by leveraging program repair and synthesis to fix the code generated by these models. BatFix takes as input both the original program, the target program generated by the machine translation model, and a set of test cases and outputs a repaired program that passes all test cases. Experimental results show that our approach is agnostic to language models and programming languages. BatFix can locate bugs spawning multiple lines and synthesize patches for syntax and semantic bugs for programs migrated from Java to C++ and Python to C++ from multiple language models, including, OpenAI’s Codex.

为了跟上需求、框架和编码实践的变化，软件企业可能需要将代码从一种语言迁移到另一种语言。源代码到源代码的迁移或转译通常是一个复杂的手动过程。转译需要源语言和目标语言的专业知识，因此非常费力且成本高昂。用于代码生成和转译的语言模型越来越受欢迎。然而，尽管语言模型能很好地捕捉代码结构，但其生成的代码往往是虚假的，并包含一些微妙的问题。我们提出的 BatFix 是一种新颖的方法，它通过利用程序修复和合成来修复由这些模型生成的代码，从而增强用于转译的语言模型。BatFix 将原始程序、机器翻译模型生成的目标程序和一组测试用例作为输入，并输出一个通过所有测试用例的修复程序。实验结果表明，我们的方法与语言模型和编程语言无关。BatFix 可以定位产生多行的错误，并为从 Java 迁移到 C++ 和从 Python 迁移到 C++ 的程序合成语法和语义错误补丁，这些程序来自多种语言模型，包括 OpenAI 的 Codex。

{"title":"BatFix: Repairing language model-based transpilation","authors":"Daniel Ramos, Inês Lynce, Vasco Manquinho, Ruben Martins, Claire Le Goues","doi":"10.1145/3658668","DOIUrl":"https://doi.org/10.1145/3658668","url":null,"abstract":"To keep up with changes in requirements, frameworks, and coding practices, software organizations might need to migrate code from one language to another. Source-to-source migration, or transpilation, is often a complex, manual process. Transpilation requires expertise both in the source and target language, making it highly laborious and costly. Languages models for code generation and transpilation are becoming increasingly popular. However, despite capturing code-structure well, code generated by language models is often spurious and contains subtle problems. We propose BatFix, a novel approach that augments language models for transpilation by leveraging program repair and synthesis to fix the code generated by these models. BatFix takes as input both the original program, the target program generated by the machine translation model, and a set of test cases and outputs a repaired program that passes all test cases. Experimental results show that our approach is agnostic to language models and programming languages. BatFix can locate bugs spawning multiple lines and synthesize patches for syntax and semantic bugs for programs migrated from <monospace>Java</monospace> to <monospace>C++</monospace> and <monospace>Python</monospace> to <monospace>C++</monospace> from multiple language models, including, OpenAI’s Codex.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"46 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140561178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MR-Scout: Automated Synthesis of Metamorphic Relations from Existing Test Cases MR-Scout：从现有测试用例自动合成变形关系

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-04-09 DOI: 10.1145/3656340

Congying Xu, Valerio Terragni, Hengcheng Zhu, Jiarong Wu, Shing-Chi Cheung

Metamorphic Testing (MT) alleviates the oracle problem by defining oracles based on metamorphic relations (MRs), that govern multiple related inputs and their outputs. However, designing MRs is challenging, as it requires domain-specific knowledge. This hinders the widespread adoption of MT. We observe that developer-written test cases can embed domain knowledge that encodes MRs. Such encoded MRs could be synthesized for testing not only their original programs but also other programs that share similar functionalities.

In this paper, we propose MR-Scout to automatically synthesize MRs from test cases in open-source software (OSS) projects. MR-Scout first discovers MR-encoded test cases (MTCs), and then synthesizes the encoded MRs into parameterized methods (called codified MRs), and filters out MRs that demonstrate poor quality for new test case generation. MR-Scout discovered over 11,000 MTCs from 701 OSS projects. Experimental results show that over 97% of codified MRs are of high quality for automated test case generation, demonstrating the practical applicability of MR-Scout. Furthermore, codified-MRs-based tests effectively enhance the test adequacy of programs with developer-written tests, leading to 13.52% and 9.42% increases in line coverage and mutation score, respectively. Our qualitative study shows that 55.76% to 76.92% of codified MRs are easily comprehensible for developers.

元变形测试（MT）通过基于元变形关系（MR）定义神谕来缓解神谕问题，元变形关系管理多个相关输入及其输出。然而，由于需要特定领域的知识，MR 的设计极具挑战性。这阻碍了 MT 的广泛应用。我们发现，开发人员编写的测试用例可以嵌入编码 MR 的领域知识。这种编码的 MR 不仅可以合成用于测试原始程序，还可以用于测试具有类似功能的其他程序。在本文中，我们提出了 MR-Scout，用于从开源软件（OSS）项目的测试用例中自动合成磁共振。MR-Scout 首先发现 MR 编码的测试用例（MTC），然后将编码的 MR 合成为参数化方法（称为编码的 MR），并过滤掉质量较差的 MR，以便生成新的测试用例。MR-Scout 从 701 个开放源码软件项目中发现了 11,000 多个 MTC。实验结果表明，超过 97% 的已编码 MR 对于自动生成测试用例具有较高的质量，这证明了 MR-Scout 的实用性。此外，基于编码磁共振的测试有效提高了开发人员编写测试的程序的测试充分性，使行覆盖率和突变分数分别提高了 13.52% 和 9.42%。我们的定性研究表明，55.76% 到 76.92% 的编码 MR 对于开发人员来说是易于理解的。

{"title":"MR-Scout: Automated Synthesis of Metamorphic Relations from Existing Test Cases","authors":"Congying Xu, Valerio Terragni, Hengcheng Zhu, Jiarong Wu, Shing-Chi Cheung","doi":"10.1145/3656340","DOIUrl":"https://doi.org/10.1145/3656340","url":null,"abstract":"Metamorphic Testing (MT) alleviates the oracle problem by defining oracles based on metamorphic relations (MRs), that govern multiple related inputs and their outputs. However, designing MRs is challenging, as it requires domain-specific knowledge. This hinders the widespread adoption of MT. We observe that developer-written test cases can embed domain knowledge that encodes MRs. Such encoded MRs could be synthesized for testing not only their original programs but also other programs that share similar functionalities. In this paper, we propose MR-Scout to automatically synthesize MRs from test cases in open-source software (OSS) projects. MR-Scout first discovers MR-encoded test cases (MTCs), and then synthesizes the encoded MRs into parameterized methods (called codified MRs), and filters out MRs that demonstrate poor quality for new test case generation. MR-Scout discovered over 11,000 MTCs from 701 OSS projects. Experimental results show that over 97% of codified MRs are of high quality for automated test case generation, demonstrating the practical applicability of MR-Scout. Furthermore, codified-MRs-based tests effectively enhance the test adequacy of programs with developer-written tests, leading to 13.52% and 9.42% increases in line coverage and mutation score, respectively. Our qualitative study shows that 55.76% to 76.92% of codified MRs are easily comprehensible for developers.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"46 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Survey of Source Code Search: A 3-Dimensional Perspective 源代码搜索调查：三维视角

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-04-06 DOI: 10.1145/3656341

Weisong Sun, Chunrong Fang, Yifei Ge, Yuling Hu, Yuchen Chen, Quanjun Zhang, Xiuting Ge, Yang Liu, Zhenyu Chen

(Source) code search is widely concerned by software engineering researchers because it can improve the productivity and quality of software development. Given a functionality requirement usually described in a natural language sentence, a code search system can retrieve code snippets that satisfy the requirement from a large-scale code corpus, e.g., GitHub. To realize effective and efficient code search, many techniques have been proposed successively. These techniques improve code search performance mainly by optimizing three core components, including query understanding component, code understanding component, and query-code matching component. In this paper, we provide a 3-dimensional perspective survey for code search. Specifically, we categorize existing code search studies into query-end optimization techniques, code-end optimization techniques, and match-end optimization techniques according to the specific components they optimize. These optimization techniques are proposed to enhance the performance of specific components, and thus the overall performance of code search. Considering that each end can be optimized independently and contributes to the code search performance, we treat each end as a dimension. Therefore, this survey is 3-dimensional in nature, and it provides a comprehensive summary of each dimension in detail. To understand the research trends of the three dimensions in existing code search studies, we systematically review 68 relevant literatures. Different from existing code search surveys that only focus on the query end or code end or introduce various aspects shallowly (including codebase, evaluation metrics, modeling technique, etc.), our survey provides a more nuanced analysis and review of the evolution and development of the underlying techniques used in the three ends. Based on a systematic review and summary of existing work, we outline several open challenges and opportunities at the three ends that remain to be addressed in future work.

(源代码搜索可以提高软件开发的效率和质量，因此受到软件工程研究人员的广泛关注。给定一个通常用自然语言句子描述的功能需求，代码搜索系统可以从大规模代码语料库（如 GitHub）中检索出满足该需求的代码片段。为了实现高效的代码搜索，人们相继提出了许多技术。这些技术主要通过优化三个核心组件来提高代码搜索性能，包括查询理解组件、代码理解组件和查询-代码匹配组件。本文从三维角度对代码搜索进行了研究。具体来说，我们将现有的代码搜索研究按照其优化的具体组件分为查询端优化技术、代码端优化技术和匹配端优化技术。这些优化技术的提出是为了提高特定组件的性能，从而提高代码搜索的整体性能。考虑到每个末端都可以独立优化并对代码搜索性能做出贡献，我们将每个末端视为一个维度。因此，本调查报告具有三维性质，对每个维度的细节进行了全面总结。为了了解现有代码搜索研究中三个维度的研究趋势，我们系统地回顾了 68 篇相关文献。与现有的代码搜索研究只关注查询端或代码端或浅层次介绍各方面（包括代码库、评估指标、建模技术等）不同，我们的调查对三端所使用的底层技术的演变和发展进行了更细致的分析和回顾。在对现有工作进行系统回顾和总结的基础上，我们概述了三端中有待在未来工作中解决的若干挑战和机遇。

{"title":"A Survey of Source Code Search: A 3-Dimensional Perspective","authors":"Weisong Sun, Chunrong Fang, Yifei Ge, Yuling Hu, Yuchen Chen, Quanjun Zhang, Xiuting Ge, Yang Liu, Zhenyu Chen","doi":"10.1145/3656341","DOIUrl":"https://doi.org/10.1145/3656341","url":null,"abstract":"(Source) code search is widely concerned by software engineering researchers because it can improve the productivity and quality of software development. Given a functionality requirement usually described in a natural language sentence, a code search system can retrieve code snippets that satisfy the requirement from a large-scale code corpus, e.g., GitHub. To realize effective and efficient code search, many techniques have been proposed successively. These techniques improve code search performance mainly by optimizing three core components, including query understanding component, code understanding component, and query-code matching component. In this paper, we provide a 3-dimensional perspective survey for code search. Specifically, we categorize existing code search studies into query-end optimization techniques, code-end optimization techniques, and match-end optimization techniques according to the specific components they optimize. These optimization techniques are proposed to enhance the performance of specific components, and thus the overall performance of code search. Considering that each end can be optimized independently and contributes to the code search performance, we treat each end as a dimension. Therefore, this survey is 3-dimensional in nature, and it provides a comprehensive summary of each dimension in detail. To understand the research trends of the three dimensions in existing code search studies, we systematically review 68 relevant literatures. Different from existing code search surveys that only focus on the query end or code end or introduce various aspects shallowly (including codebase, evaluation metrics, modeling technique, etc.), our survey provides a more nuanced analysis and review of the evolution and development of the underlying techniques used in the three ends. Based on a systematic review and summary of existing work, we outline several open challenges and opportunities at the three ends that remain to be addressed in future work.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"5 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140561021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Help Them Understand: Testing and Improving Voice User Interfaces 帮助他们理解：测试和改进语音用户界面

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-04-05 DOI: 10.1145/3654438

Emanuela Guglielmi, Giovanni Rosa, Simone Scalabrino, Gabriele Bavota, Rocco Oliveto

Voice-based virtual assistants are becoming increasingly popular. Such systems provide frameworks to developers for building custom apps. End-users can interact with such apps through a Voice User Interface (VUI), which allows the user to use natural language commands to perform actions. Testing such apps is not trivial: The same command can be expressed in different semantically equivalent ways. In this paper, we introduce VUI-UPSET, an approach that adapts chatbot-testing approaches to VUI-testing. We conducted an empirical study to understand how VUI-UPSET compares to two state-of-the-art approaches (i.e., a chatbot testing technique and ChatGPT) in terms of (i) correctness of the generated paraphrases, and (ii) capability of revealing bugs. To this aim, we analyzed 14,898 generated paraphrases for 40 Alexa Skills. Our results show that VUI-UPSET generates more bug-revealing paraphrases than the two baselines with, however, ChatGPT being the approach generating the highest percentage of correct paraphrases. We also tried to use the generated paraphrases to improve the skills. We tried to include in the voice interaction models of the skills (i) only the bug-revealing paraphrases, (ii) all the valid paraphrases. We observed that including only bug-revealing paraphrases is sometimes not sufficient to make all the tests pass.

基于语音的虚拟助手越来越受欢迎。这类系统为开发人员提供了开发定制应用程序的框架。终端用户可通过语音用户界面（VUI）与此类应用程序进行交互，该界面允许用户使用自然语言命令执行操作。测试此类应用程序并非易事：相同的命令可以用不同的语义等价方式表达。在本文中，我们介绍了 VUI-UPSET，这是一种将聊天机器人测试方法应用于 VUI 测试的方法。我们进行了一项实证研究，以了解 VUI-UPSET 与两种最先进的方法（即聊天机器人测试技术和 ChatGPT）在以下方面的比较情况：(i) 生成解析的正确性；(ii) 揭示错误的能力。为此，我们分析了为 40 个 Alexa 技能生成的 14,898 条转述。结果表明，VUI-UPSET 生成的能揭示错误的转述比两个基线方法多，而 ChatGPT 生成的转述正确率最高。我们还尝试使用生成的转述来提高技能。我们尝试在技能的语音交互模型中 (i) 只包含揭示错误的转述，(ii) 包含所有有效的转述。我们发现，仅包含揭示错误的转述有时不足以使所有测试通过。

{"title":"Help Them Understand: Testing and Improving Voice User Interfaces","authors":"Emanuela Guglielmi, Giovanni Rosa, Simone Scalabrino, Gabriele Bavota, Rocco Oliveto","doi":"10.1145/3654438","DOIUrl":"https://doi.org/10.1145/3654438","url":null,"abstract":"Voice-based virtual assistants are becoming increasingly popular. Such systems provide frameworks to developers for building custom apps. End-users can interact with such apps through a Voice User Interface (VUI), which allows the user to use natural language commands to perform actions. Testing such apps is not trivial: The same command can be expressed in different semantically equivalent ways. In this paper, we introduce VUI-UPSET, an approach that adapts chatbot-testing approaches to VUI-testing. We conducted an empirical study to understand how VUI-UPSET compares to two state-of-the-art approaches (i.e., a chatbot testing technique and ChatGPT) in terms of (i) correctness of the generated paraphrases, and (ii) capability of revealing bugs. To this aim, we analyzed 14,898 generated paraphrases for 40 Alexa Skills. Our results show that VUI-UPSET generates more bug-revealing paraphrases than the two baselines with, however, ChatGPT being the approach generating the highest percentage of correct paraphrases. We also tried to use the generated paraphrases to improve the skills. We tried to include in the voice interaction models of the skills (i) only the bug-revealing paraphrases, (ii) all the valid paraphrases. We observed that including only bug-revealing paraphrases is sometimes not sufficient to make all the tests pass.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"48 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0