Information and Software Technology最新文献_第7页

Qubernetes: Towards a unified cloud-native execution platform for hybrid classic-quantum computing Qubernetes：为经典-量子混合计算打造统一的云原生执行平台

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology

Pub Date : 2024-07-26 DOI: 10.1016/j.infsof.2024.107529

Vlad Stirbu, Otso Kinanen, Majid Haghparast, Tommi Mikkonen

Context:

The emergence of quantum computing proposes a revolutionary paradigm that can radically transform numerous scientific and industrial application domains. The ability of quantum computers to scale computations beyond what the current computers are capable of implies better performance and efficiency for certain algorithmic tasks.

Objective:

However, to benefit from such improvement, quantum computers must be integrated with existing software systems, a process that is not straightforward. In this paper, we propose a unified execution model that addresses the challenges that emerge from building hybrid classical-quantum applications at scale.

Method:

Following the Design Science Research methodology, we proposed a convention for mapping quantum resources and artifacts to Kubernetes concepts. Then, in an experimental Kubernetes cluster, we conducted experiments for scheduling and executing quantum tasks on both quantum simulators and hardware.

Results:

The experimental results demonstrate that the proposed platform Qubernetes (or Kubernetes for quantum) exposes the quantum computation tasks and hardware capabilities following established cloud-native principles, allowing seamless integration into the larger Kubernetes ecosystem.

Conclusion:

The quantum computing potential cannot be realized without seamless integration into classical computing. By validating that it is practical to execute quantum tasks in a Kubernetes infrastructure, we pave the way for leveraging the existing Kubernetes ecosystem as an enabler for hybrid classical-quantum computing.

背景：量子计算的出现提出了一种革命性的范式，可以从根本上改变众多科学和工业应用领域。目标：然而，要从这种改进中获益，量子计算机必须与现有软件系统集成，而这一过程并不简单。在本文中，我们提出了一种统一的执行模型，以解决在大规模构建经典-量子混合应用过程中出现的挑战。方法：按照设计科学研究方法，我们提出了一种将量子资源和工件映射到 Kubernetes 概念的约定。结果：实验结果表明，拟议的平台Qubernetes（或量子Kubernetes）按照既定的云原生原则公开了量子计算任务和硬件能力，允许无缝集成到更大的Kubernetes生态系统中。通过验证在 Kubernetes 基础架构中执行量子任务是切实可行的，我们为利用现有 Kubernetes 生态系统推动经典-量子混合计算铺平了道路。

{"title":"Qubernetes: Towards a unified cloud-native execution platform for hybrid classic-quantum computing","authors":"Vlad Stirbu, Otso Kinanen, Majid Haghparast, Tommi Mikkonen","doi":"10.1016/j.infsof.2024.107529","DOIUrl":"10.1016/j.infsof.2024.107529","url":null,"abstract":"<div><h3>Context:</h3><p>The emergence of quantum computing proposes a revolutionary paradigm that can radically transform numerous scientific and industrial application domains. The ability of quantum computers to scale computations beyond what the current computers are capable of implies better performance and efficiency for certain algorithmic tasks.</p></div><div><h3>Objective:</h3><p>However, to benefit from such improvement, quantum computers must be integrated with existing software systems, a process that is not straightforward. In this paper, we propose a unified execution model that addresses the challenges that emerge from building hybrid classical-quantum applications at scale.</p></div><div><h3>Method:</h3><p>Following the Design Science Research methodology, we proposed a convention for mapping quantum resources and artifacts to Kubernetes concepts. Then, in an experimental Kubernetes cluster, we conducted experiments for scheduling and executing quantum tasks on both quantum simulators and hardware.</p></div><div><h3>Results:</h3><p>The experimental results demonstrate that the proposed platform Qubernetes (or Kubernetes for quantum) exposes the quantum computation tasks and hardware capabilities following established cloud-native principles, allowing seamless integration into the larger Kubernetes ecosystem.</p></div><div><h3>Conclusion:</h3><p>The quantum computing potential cannot be realized without seamless integration into classical computing. By validating that it is practical to execute quantum tasks in a Kubernetes infrastructure, we pave the way for leveraging the existing Kubernetes ecosystem as an enabler for hybrid classical-quantum computing.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"175 ","pages":"Article 107529"},"PeriodicalIF":3.8,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0950584924001344/pdfft?md5=f5a427a08f3dbd8b3b7ccb4f62b577ea&pid=1-s2.0-S0950584924001344-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141838848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Extraction and empirical evaluation of GUI-level invariants as GUI Oracles in mobile app testing 在移动应用程序测试中提取图形用户界面级不变式并对其进行实证评估

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology

Pub Date : 2024-07-23 DOI: 10.1016/j.infsof.2024.107531

Ali Asghar Yarifard , Saeed Araban , Samad Paydar , Vahid Garousi , Maurizio Morisio , Riccardo Coppola

Context

Mobile apps (software) are used in almost all aspects of daily life by billions of people. Given the widespread use of mobile apps in various domains, the demand for systematic testing of their Graphical User Interfaces (GUI) is crucial. Despite the significant advances in automated mobile app testing over the last decade, certain challenges remain, most notably the app-specific GUI test-oracle problem, which can significantly hinder the effective detection of defects in mobile apps. In this study, we introduce the use of GUI-level invariants, referred to as GUI invariants, as app-specific GUI oracles in GUI test cases to address this challenge.

Methods

We propose a semi-automatic solution to extract GUI invariants and use them as app-specific GUI oracles in test cases. We use the mutation testing technique to evaluate the (fault detection) effectiveness of the GUI oracles used. In addition, we evaluate their quality aspects, namely correctness, understandability, and compatibility, from the perspective of human experts using a questionnaire survey.

Results

The empirical results show that the GUI oracles used are effective and helpful, as they improved the fault-detection effectiveness of the empirical test suites ranging from 18% to 32%. These results also highlight the efficacy of GUI oracles used in identifying various defects, including crashing and non-crashing functional issues, and surpassing the performance of existing tools in fault-detection rates. Additionally, the questionnaire survey outcomes indicate that the GUI oracles used are correct, understandable, and compatible.

Conclusions

Based on the empirical results, we can conclude that using GUI invariants as GUI oracles can be useful and effective in mobile app testing.

背景移动应用程序（软件）几乎应用于数十亿人日常生活的方方面面。鉴于移动应用程序在各个领域的广泛应用，对其图形用户界面（GUI）进行系统测试的需求至关重要。尽管自动移动应用程序测试在过去十年中取得了重大进展，但某些挑战依然存在，其中最突出的是特定于应用程序的图形用户界面测试障碍问题，该问题会严重阻碍移动应用程序缺陷的有效检测。在本研究中，我们介绍了在图形用户界面测试用例中使用图形用户界面级不变式（简称为图形用户界面不变式）作为特定于应用程序的图形用户界面oracle，以应对这一挑战。我们使用突变测试技术来评估所使用的图形用户界面规则的（故障检测）有效性。此外，我们还通过问卷调查从人类专家的角度评估了它们的质量方面，即正确性、可理解性和兼容性。结果实证结果表明，所使用的图形用户界面特例是有效和有用的，因为它们提高了实证测试套件的故障检测效率，提高幅度从 18% 到 32% 不等。这些结果还凸显了所使用的图形用户界面工具在识别各种缺陷（包括崩溃和非崩溃功能问题）方面的功效，以及在故障检测率方面超越现有工具的性能。此外，问卷调查结果表明，所使用的图形用户界面规则是正确的、可理解的和兼容的。

{"title":"Extraction and empirical evaluation of GUI-level invariants as GUI Oracles in mobile app testing","authors":"Ali Asghar Yarifard , Saeed Araban , Samad Paydar , Vahid Garousi , Maurizio Morisio , Riccardo Coppola","doi":"10.1016/j.infsof.2024.107531","DOIUrl":"10.1016/j.infsof.2024.107531","url":null,"abstract":"<div><h3>Context</h3><p>Mobile apps (software) are used in almost all aspects of daily life by billions of people. Given the widespread use of mobile apps in various domains, the demand for systematic testing of their Graphical User Interfaces (GUI) is crucial. Despite the significant advances in automated mobile app testing over the last decade, certain challenges remain, most notably the app-specific GUI test-oracle problem, which can significantly hinder the effective detection of defects in mobile apps. In this study, we introduce the use of GUI-level invariants, referred to as GUI invariants, as app-specific GUI oracles in GUI test cases to address this challenge.</p></div><div><h3>Methods</h3><p>We propose a semi-automatic solution to extract GUI invariants and use them as app-specific GUI oracles in test cases. We use the mutation testing technique to evaluate the (fault detection) effectiveness of the GUI oracles used. In addition, we evaluate their quality aspects, namely correctness, understandability, and compatibility, from the perspective of human experts using a questionnaire survey.</p></div><div><h3>Results</h3><p>The empirical results show that the GUI oracles used are effective and helpful, as they improved the fault-detection effectiveness of the empirical test suites ranging from 18% to 32%. These results also highlight the efficacy of GUI oracles used in identifying various defects, including crashing and non-crashing functional issues, and surpassing the performance of existing tools in fault-detection rates. Additionally, the questionnaire survey outcomes indicate that the GUI oracles used are correct, understandable, and compatible.</p></div><div><h3>Conclusions</h3><p>Based on the empirical results, we can conclude that using GUI invariants as GUI oracles can be useful and effective in mobile app testing.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"177 ","pages":"Article 107531"},"PeriodicalIF":3.8,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0950584924001368/pdfft?md5=df482f7792ae43af274444769943b80c&pid=1-s2.0-S0950584924001368-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141844371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A PRISMA-driven systematic mapping study on system assurance weakeners 关于系统保证削弱因素的 PRISMA 驱动型系统映射研究

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology

Pub Date : 2024-07-20 DOI: 10.1016/j.infsof.2024.107526

Kimya Khakzad Shahandashti , Alvine B. Belle , Timothy C. Lethbridge , Oluwafemi Odu , Mithila Sivakumar

Context:

An assurance case is a structured hierarchy of claims aiming at demonstrating that a mission-critical system supports specific requirements (e.g., safety, security, privacy). The presence of assurance weakeners (i.e., assurance deficits, logical fallacies) in assurance cases reflects insufficient evidence, knowledge, or gaps in reasoning. These weakeners can undermine confidence in assurance arguments, potentially hindering the verification of mission-critical system capabilities which could result in catastrophic outcomes (e.g., loss of lives). Given the growing interest in employing assurance cases to ensure that systems are developed to meet their requirements, exploring the management of assurance weakeners becomes beneficial.

Objective:

As a stepping stone for future research on assurance weakeners, we aim to initiate the first comprehensive systematic mapping study on this subject.

Methods:

We followed the well-established PRISMA 2020 and SEGRESS guidelines to conduct our systematic mapping study. We searched for primary studies in five digital libraries and focused on the 2012–2023 publication year range. Our selection criteria focused on studies addressing assurance weakeners from a qualitative standpoint, resulting in the inclusion of 39 primary studies in our systematic review.

Results:

Our systematic mapping study reports a taxonomy (map) that provides a uniform categorization of assurance weakeners and approaches proposed to manage them from a qualitative perspective. The taxonomy classifies weakeners in four categories: aleatory, epistemic, ontological, and argument uncertainty. Additionally, it classifies approaches supporting the management of weakeners in three main categories: representation, identification and mitigation approaches.

Conclusion:

Our study findings suggest that the SACM (Structured Assurance Case Metamodel) – a standard specified by the OMG (Object Management Group) – offers a comprehensive range of capabilities to capture structured arguments and reason about their potential assurance weakeners. Our findings also suggest novel assurance weakener management approaches should be proposed to better assure mission-critical systems.

内涵：保证案例是一个结构化的索赔层次，旨在证明关键任务系统支持特定要求（如安全、保安、隐私）。保证案例中存在的保证弱化（即保证缺陷、逻辑谬误）反映了证据、知识或推理中的不足。这些削弱因素会破坏对保证论证的信心，可能会阻碍对关键任务系统能力的验证，从而导致灾难性后果（如生命损失）。目标：作为未来对保证弱化因素进行研究的垫脚石，我们的目标是启动关于该主题的第一项全面的系统映射研究。方法：我们遵循成熟的 PRISMA 2020 和 SEGRESS 指南开展系统映射研究。我们在五个数字图书馆中搜索了主要研究，并将重点放在 2012-2023 年的出版年份范围内。结果：我们的系统图谱研究报告了一个分类法（图谱），该分类法从定性的角度对保证弱化因素和管理这些弱化因素的方法进行了统一分类。该分类法将弱化因素分为四类：假定性、认识论、本体论和论证不确定性。结论：我们的研究结果表明，结构化保证案例元模型（SACM，Structured Assurance Case Metamodel）--OMG（对象管理组织）指定的标准--提供了全面的功能，可以捕获结构化论证并推理其潜在的保证弱化。我们的研究结果还表明，应提出新的保证弱化物管理方法，以更好地保证关键任务系统。

{"title":"A PRISMA-driven systematic mapping study on system assurance weakeners","authors":"Kimya Khakzad Shahandashti , Alvine B. Belle , Timothy C. Lethbridge , Oluwafemi Odu , Mithila Sivakumar","doi":"10.1016/j.infsof.2024.107526","DOIUrl":"10.1016/j.infsof.2024.107526","url":null,"abstract":"<div><h3>Context:</h3><p>An assurance case is a structured hierarchy of claims aiming at demonstrating that a mission-critical system supports specific requirements (e.g., safety, security, privacy). The presence of assurance weakeners (i.e., assurance deficits, logical fallacies) in assurance cases reflects insufficient evidence, knowledge, or gaps in reasoning. These weakeners can undermine confidence in assurance arguments, potentially hindering the verification of mission-critical system capabilities which could result in catastrophic outcomes (e.g., loss of lives). Given the growing interest in employing assurance cases to ensure that systems are developed to meet their requirements, exploring the management of assurance weakeners becomes beneficial.</p></div><div><h3>Objective:</h3><p>As a stepping stone for future research on assurance weakeners, we aim to initiate the first comprehensive systematic mapping study on this subject.</p></div><div><h3>Methods:</h3><p>We followed the well-established PRISMA 2020 and SEGRESS guidelines to conduct our systematic mapping study. We searched for primary studies in five digital libraries and focused on the 2012–2023 publication year range. Our selection criteria focused on studies addressing assurance weakeners from a qualitative standpoint, resulting in the inclusion of 39 primary studies in our systematic review.</p></div><div><h3>Results:</h3><p>Our systematic mapping study reports a taxonomy (map) that provides a uniform categorization of assurance weakeners and approaches proposed to manage them from a qualitative perspective. The taxonomy classifies weakeners in four categories: aleatory, epistemic, ontological, and argument uncertainty. Additionally, it classifies approaches supporting the management of weakeners in three main categories: representation, identification and mitigation approaches.</p></div><div><h3>Conclusion:</h3><p>Our study findings suggest that the SACM (Structured Assurance Case Metamodel) – a standard specified by the OMG (Object Management Group) – offers a comprehensive range of capabilities to capture structured arguments and reason about their potential assurance weakeners. Our findings also suggest novel assurance weakener management approaches should be proposed to better assure mission-critical systems.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"175 ","pages":"Article 107526"},"PeriodicalIF":3.8,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0950584924001319/pdfft?md5=5f5782fecf500fafd6a0caffa9ef549f&pid=1-s2.0-S0950584924001319-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141846295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An empirical study on compatibility issues in Android API field evolution 关于安卓应用程序接口领域演进中兼容性问题的实证研究

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology

Pub Date : 2024-07-20 DOI: 10.1016/j.infsof.2024.107530

Tarek Mahmud , Meiru Che , Guowei Yang

Context:

The continuous evolution of the Android operating system requires regular API updates, which may affect the functionality of Android apps. This is becoming increasingly common due to the frequent evolution of the Android platform, which introduces new APIs and deprecates existing ones. Recent studies investigated API evolution to ensure the reliability of Android apps; however, they focused on API methods alone.

Objectives:

This study aims to understand how API fields evolve and how this affects API compatibility in real-world Android apps and their development.

Method:

We perform an empirical study on compatibility issues in Android API field evolution by analyzing the nature and resolution of these issues across 681 open-source Android apps.

Results:

Our experimental results yield interesting findings: (1) On average two API field compatibility issues exist per app in each tag; (2) Although API method evolution and API field evolution are related, current API method-level analysis techniques may fail to detect numerous API field compatibility issues; (3) Different types of checks are preferred when addressing different types of compatibility issues; (4) It takes on average three and a half months for an API field compatibility issue to get fixed since when it is introduced; (5) Developers pay proper attention to API field compatibility issues and address them soon after becoming aware of them in the apps.

Conclusion:

These findings highlight the significance of including API fields in future research on API evolution and can assist developers and researchers in understanding, detecting, and handling compatibility issues in API field evolution.

背景：Android 操作系统的不断演进要求定期更新 API，这可能会影响 Android 应用程序的功能。由于安卓平台的频繁演进，引入了新的应用程序接口并淘汰了现有的应用程序接口，这种情况变得越来越普遍。本研究旨在了解API字段是如何演变的，以及这对现实世界中Android应用程序及其开发中的API兼容性有何影响。方法：我们通过分析681个开源Android应用程序中这些问题的性质和解决方案，对Android API字段演变中的兼容性问题进行了实证研究。结果：我们的实验结果得出了有趣的发现：（1）每个应用程序的每个标签中平均存在两个API字段兼容性问题；（2）虽然API方法演进和API字段演进是相关的，但当前的API方法级分析技术可能无法检测到大量的API字段兼容性问题；（3）在解决不同类型的兼容性问题时，不同类型的检查是首选；（4）一个API字段兼容性问题从出现到得到解决平均需要三个半月的时间；（5）开发人员会适当关注API字段兼容性问题，并在意识到应用程序中存在这些问题后尽快解决它们。结论：这些发现凸显了将 API 字段纳入未来 API 演进研究的重要性，有助于开发人员和研究人员了解、检测和处理 API 字段演进中的兼容性问题。

{"title":"An empirical study on compatibility issues in Android API field evolution","authors":"Tarek Mahmud , Meiru Che , Guowei Yang","doi":"10.1016/j.infsof.2024.107530","DOIUrl":"10.1016/j.infsof.2024.107530","url":null,"abstract":"<div><h3>Context:</h3><p>The continuous evolution of the Android operating system requires regular API updates, which may affect the functionality of Android apps. This is becoming increasingly common due to the frequent evolution of the Android platform, which introduces new APIs and deprecates existing ones. Recent studies investigated API evolution to ensure the reliability of Android apps; however, they focused on API methods alone.</p></div><div><h3>Objectives:</h3><p>This study aims to understand how API fields evolve and how this affects API compatibility in real-world Android apps and their development.</p></div><div><h3>Method:</h3><p>We perform an empirical study on compatibility issues in Android API field evolution by analyzing the nature and resolution of these issues across 681 open-source Android apps.</p></div><div><h3>Results:</h3><p>Our experimental results yield interesting findings: (1) On average two API field compatibility issues exist per app in each tag; (2) Although API method evolution and API field evolution are related, current API method-level analysis techniques may fail to detect numerous API field compatibility issues; (3) Different types of checks are preferred when addressing different types of compatibility issues; (4) It takes on average three and a half months for an API field compatibility issue to get fixed since when it is introduced; (5) Developers pay proper attention to API field compatibility issues and address them soon after becoming aware of them in the apps.</p></div><div><h3>Conclusion:</h3><p>These findings highlight the significance of including API fields in future research on API evolution and can assist developers and researchers in understanding, detecting, and handling compatibility issues in API field evolution.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"175 ","pages":"Article 107530"},"PeriodicalIF":3.8,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141846783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The quantum frontier of software engineering: A systematic mapping study 软件工程的量子前沿：系统映射研究

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology

Pub Date : 2024-07-18 DOI: 10.1016/j.infsof.2024.107525

Manuel De Stefano, Fabiano Pecorelli, Dario Di Nucci, Fabio Palomba, Andrea De Lucia

Context:

Quantum computing is becoming a reality, and quantum software engineering (QSE) is emerging as a new discipline to enable developers to design and develop quantum programs.

Objective:

This paper presents a systematic mapping study of the current state of QSE research, aiming to identify the most investigated topics, the types and number of studies, the main reported results, and the most studied quantum computing tools/frameworks. Additionally, the study aims to explore the research community’s interest in QSE, how it has evolved, and any prior contributions to the discipline before its formal introduction through the Talavera Manifesto.

Method:

We searched for relevant articles in several databases and applied inclusion and exclusion criteria to select the most relevant studies. After evaluating the quality of the selected resources, we extracted relevant data from the primary studies and analyzed them.

Results:

We found that QSE research has primarily focused on software testing, with little attention given to other topics, such as software engineering management. The most commonly studied technology for techniques and tools is Qiskit, although, in most studies, either multiple or none specific technologies were employed. The researchers most interested in QSE are interconnected through direct collaborations, and several strong collaboration clusters have been identified. Most articles in QSE have been published in non-thematic venues, with a preference for conferences.

Conclusions:

The study’s implications are providing a centralized source of information for researchers and practitioners in the field, facilitating knowledge transfer, and contributing to the advancement and growth of QSE.

量子计算正在成为现实，而量子软件工程（QSE）正在成为一门新学科，使开发人员能够设计和开发量子程序。本文对量子软件工程的研究现状进行了系统的摸底研究，旨在确定调查最多的主题、研究类型和数量、主要报告结果以及研究最多的量子计算工具/框架。此外，本研究还旨在探索研究界对 QSE 的兴趣、QSE 的发展历程，以及在通过《塔拉韦拉宣言》正式引入 QSE 之前对该学科做出的任何贡献。我们在多个数据库中搜索了相关文章，并采用纳入和排除标准来选择最相关的研究。在对所选资源的质量进行评估后，我们从主要研究中提取了相关数据并对其进行了分析。我们发现，QSE 研究主要集中在软件测试方面，很少关注其他主题，如软件工程管理。在技术和工具方面，最常研究的技术是 Qiskit，不过在大多数研究中，要么采用了多种特定技术，要么没有采用任何特定技术。对 QSE 最感兴趣的研究人员是通过直接合作相互联系的，并且已经确定了几个强大的合作集群。大多数有关 QSE 的文章都发表在非主题性的刊物上，而且更倾向于在会议上发表。这项研究的意义在于为该领域的研究人员和从业人员提供一个集中的信息来源，促进知识转移，并为 QSE 的进步和发展做出贡献。

{"title":"The quantum frontier of software engineering: A systematic mapping study","authors":"Manuel De Stefano, Fabiano Pecorelli, Dario Di Nucci, Fabio Palomba, Andrea De Lucia","doi":"10.1016/j.infsof.2024.107525","DOIUrl":"10.1016/j.infsof.2024.107525","url":null,"abstract":"<div><h3>Context:</h3><p>Quantum computing is becoming a reality, and quantum software engineering (QSE) is emerging as a new discipline to enable developers to design and develop quantum programs.</p></div><div><h3>Objective:</h3><p>This paper presents a systematic mapping study of the current state of QSE research, aiming to identify the most investigated topics, the types and number of studies, the main reported results, and the most studied quantum computing tools/frameworks. Additionally, the study aims to explore the research community’s interest in QSE, how it has evolved, and any prior contributions to the discipline before its formal introduction through the Talavera Manifesto.</p></div><div><h3>Method:</h3><p>We searched for relevant articles in several databases and applied inclusion and exclusion criteria to select the most relevant studies. After evaluating the quality of the selected resources, we extracted relevant data from the primary studies and analyzed them.</p></div><div><h3>Results:</h3><p>We found that QSE research has primarily focused on software testing, with little attention given to other topics, such as software engineering management. The most commonly studied technology for techniques and tools is Qiskit, although, in most studies, either multiple or none specific technologies were employed. The researchers most interested in QSE are interconnected through direct collaborations, and several strong collaboration clusters have been identified. Most articles in QSE have been published in non-thematic venues, with a preference for conferences.</p></div><div><h3>Conclusions:</h3><p>The study’s implications are providing a centralized source of information for researchers and practitioners in the field, facilitating knowledge transfer, and contributing to the advancement and growth of QSE.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"175 ","pages":"Article 107525"},"PeriodicalIF":3.8,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0950584924001307/pdfft?md5=d0ccf8db440c2e6dfde8fed9283c96a6&pid=1-s2.0-S0950584924001307-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An exploratory study on just-in-time multi-programming-language bug prediction 关于即时多编程语言错误预测的探索性研究

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology

Pub Date : 2024-07-18 DOI: 10.1016/j.infsof.2024.107524

Zengyang Li , Jiabao Ji , Peng Liang , Ran Mo , Hui Liu

Context:

An increasing number of software systems are written in multiple programming languages (PLs), which are called multi-programming-language (MPL) systems. MPL bugs (MPLBs) refers to the bugs whose resolution involves multiple PLs. Despite high complexity of MPLB resolution, there lacks MPLB prediction methods.

Objective:

This work aims to construct just-in-time (JIT) MPLB prediction models with selected prediction metrics, analyze the significance of the metrics, and then evaluate the performance of cross-project JIT MPLB prediction.

Methods:

We develop JIT MPLB prediction models with the selected metrics using machine learning algorithms and evaluate the models in within-project and cross-project contexts with our constructed dataset based on 18 Apache MPL projects.

Results:

Random Forest is appropriate for JIT MPLB prediction. Changed LOC of all files, added LOC of all files, and the total number of lines of all files of the project currently are the most crucial metrics in JIT MPLB prediction. The prediction models can be simplified using a few top-ranked metrics. Training on the dataset from multiple projects can yield significantly higherAUC than training on the dataset from a single project for cross-project JIT MPLB prediction.

Conclusions:

JIT MPLB prediction models can be constructed with the selected set of metrics, which can be reduced to build simplified JIT MPLB prediction models, and cross-project JIT MPLB prediction is feasible.

背景：越来越多的软件系统使用多种编程语言（PL）编写，这些系统被称为多编程语言（MPL）系统。多编程语言错误（MPLBs）是指其解决涉及多个编程语言的错误。尽管 MPLB 解决的复杂性很高，但却缺乏 MPLB 预测方法。目的：本工作旨在利用选定的预测指标构建及时（JIT）MPLB 预测模型，分析指标的重要性，然后评估跨项目 JIT MPLB 预测的性能。方法：我们使用机器学习算法开发了具有选定指标的 JIT MPLB 预测模型，并使用我们基于 18 个 Apache MPL 项目构建的数据集在项目内和跨项目情况下对模型进行了评估。所有文件的已更改 LOC、所有文件的已添加 LOC 和当前项目所有文件的总行数是 JIT MPLB 预测中最关键的指标。预测模型可以通过几个排名靠前的指标来简化。在跨项目 JIT MPLB 预测中，在多个项目的数据集上进行训练可获得比在单个项目的数据集上进行训练高得多的 AUC。结论：可以使用所选的指标集构建 JIT MPLB 预测模型，并可通过缩减指标集来构建简化的 JIT MPLB 预测模型，而且跨项目 JIT MPLB 预测是可行的。

{"title":"An exploratory study on just-in-time multi-programming-language bug prediction","authors":"Zengyang Li , Jiabao Ji , Peng Liang , Ran Mo , Hui Liu","doi":"10.1016/j.infsof.2024.107524","DOIUrl":"10.1016/j.infsof.2024.107524","url":null,"abstract":"<div><h3>Context:</h3><p>An increasing number of software systems are written in multiple programming languages (PLs), which are called multi-programming-language (MPL) systems. MPL bugs (MPLBs) refers to the bugs whose resolution involves multiple PLs. Despite high complexity of MPLB resolution, there lacks MPLB prediction methods.</p></div><div><h3>Objective:</h3><p>This work aims to construct just-in-time (JIT) MPLB prediction models with selected prediction metrics, analyze the significance of the metrics, and then evaluate the performance of cross-project JIT MPLB prediction.</p></div><div><h3>Methods:</h3><p>We develop JIT MPLB prediction models with the selected metrics using machine learning algorithms and evaluate the models in within-project and cross-project contexts with our constructed dataset based on 18 Apache MPL projects.</p></div><div><h3>Results:</h3><p>Random Forest is appropriate for JIT MPLB prediction. Changed LOC of all files, added LOC of all files, and the total number of lines of all files of the project currently are the most crucial metrics in JIT MPLB prediction. The prediction models can be simplified using a few top-ranked metrics. Training on the dataset from multiple projects can yield significantly higherAUC than training on the dataset from a single project for cross-project JIT MPLB prediction.</p></div><div><h3>Conclusions:</h3><p>JIT MPLB prediction models can be constructed with the selected set of metrics, which can be reduced to build simplified JIT MPLB prediction models, and cross-project JIT MPLB prediction is feasible.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"175 ","pages":"Article 107524"},"PeriodicalIF":3.8,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141853189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SFIDMT-ART: A metamorphic group generation method based on Adaptive Random Testing applied to source and follow-up input domains SFIDMT-ART：基于自适应随机测试的变态群生成方法，适用于源输入域和后续输入域

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology

Pub Date : 2024-07-18 DOI: 10.1016/j.infsof.2024.107528

Zhihao Ying , Dave Towey , Anthony Graham Bellotti , Tsong Yueh Chen , Zhi Quan Zhou

Context:

The performance of metamorphic testing relates strongly to the quality of test cases. However, most related research has only focused on source test cases, ignoring follow-up test cases to some extent. In this paper, we identify a potential problem that may be encountered with existing metamorphic group generation algorithms. We then propose a possible solution to address this problem. Based on this solution, we design a new algorithm for generating effective source and follow-up test cases.

Objective:

To improve the performance (test effectiveness and efficiency) of metamorphic testing.

Methods:

We introduce the concept of the input-domain difference problem, which is likely to affect the performance of metamorphic group generation algorithms. We propose a new test-case distribution criterion for metamorphic testing to address this problem. Based on our proposed criterion, we further present a new metamorphic group generation algorithm, from a black-box perspective, with new distance metrics to facilitate this algorithm.

Results:

Our algorithm performs significantly better than existing algorithms, in terms of test effectiveness, efficiency and test-case diversity.

Conclusions:

Through experiments, we find that the input-domain difference problem is likely to affect the performance of metamorphic group generation algorithms. The experimental results demonstrate that our algorithm can achieve good test efficiency, effectiveness, and test-case diversity.

背景：变形测试的性能与测试用例的质量密切相关。然而，大多数相关研究只关注源测试用例，在一定程度上忽略了后续测试用例。在本文中，我们指出了现有的元组生成算法可能会遇到的潜在问题。然后，我们提出了解决这一问题的可行方案。方法：我们引入了输入域差异问题的概念，该问题可能会影响元组生成算法的性能。针对这一问题，我们提出了一种新的变形测试测试用例分布标准。结果：我们的算法在测试效果、效率和测试用例多样性方面的表现明显优于现有算法。结论：通过实验，我们发现输入域差异问题很可能会影响变态组生成算法的性能。实验结果表明，我们的算法可以实现良好的测试效率、效果和测试用例多样性。

{"title":"SFIDMT-ART: A metamorphic group generation method based on Adaptive Random Testing applied to source and follow-up input domains","authors":"Zhihao Ying , Dave Towey , Anthony Graham Bellotti , Tsong Yueh Chen , Zhi Quan Zhou","doi":"10.1016/j.infsof.2024.107528","DOIUrl":"10.1016/j.infsof.2024.107528","url":null,"abstract":"<div><h3>Context:</h3><p>The performance of metamorphic testing relates strongly to the quality of test cases. However, most related research has only focused on source test cases, ignoring follow-up test cases to some extent. In this paper, we identify a potential problem that may be encountered with existing metamorphic group generation algorithms. We then propose a possible solution to address this problem. Based on this solution, we design a new algorithm for generating effective source and follow-up test cases.</p></div><div><h3>Objective:</h3><p>To improve the performance (test effectiveness and efficiency) of metamorphic testing.</p></div><div><h3>Methods:</h3><p>We introduce the concept of the input-domain difference problem, which is likely to affect the performance of metamorphic group generation algorithms. We propose a new test-case distribution criterion for metamorphic testing to address this problem. Based on our proposed criterion, we further present a new metamorphic group generation algorithm, from a black-box perspective, with new distance metrics to facilitate this algorithm.</p></div><div><h3>Results:</h3><p>Our algorithm performs significantly better than existing algorithms, in terms of test effectiveness, efficiency and test-case diversity.</p></div><div><h3>Conclusions:</h3><p>Through experiments, we find that the input-domain difference problem is likely to affect the performance of metamorphic group generation algorithms. The experimental results demonstrate that our algorithm can achieve good test efficiency, effectiveness, and test-case diversity.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"175 ","pages":"Article 107528"},"PeriodicalIF":3.8,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0950584924001332/pdfft?md5=47d3eca48ca9d0d5eb3854e033268e84&pid=1-s2.0-S0950584924001332-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141844581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cross-Modal Retrieval-enhanced code Summarization based on joint learning for retrieval and generation 基于检索和生成联合学习的跨模态检索增强代码摘要

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology

Pub Date : 2024-07-18 DOI: 10.1016/j.infsof.2024.107527

Lixuan Li , Bin Liang , Lin Chen , Xiaofang Zhang

Context:

Code summarization refers to a task that automatically generates a natural language description of a code snippet to facilitate code comprehension. Existing methods have achieved satisfactory results by incorporating information retrieval into generative deep-learning models for reusing summaries of existing code. However, most of these existing methods employed non-learnable generic retrieval methods for content-based retrieval, resulting in a lack of diversity in the retrieved results during training, thereby making the model over-reliant on retrieved results and reducing the generative model’s ability to generalize to unknown samples.

Objective:

To address this issue, this paper introduces CMR-Sum: a novel Cross-Modal Retrieval-enhanced code Summarization framework based on joint learning for generation and retrieval tasks, where both two tasks are allowed to be optimized simultaneously.

Method:

Specifically, we use a cross-modal retrieval module to dynamically alter retrieval results during training, which enhances the diversity of the retrieved results and maintains a relative balance between the two tasks. Furthermore, in the summary generation phase, we employ a cross-attention mechanism to generate code summaries based on the alignment between retrieved and generated summaries. We conducted experiments on three real-world datasets, comparing the performance of our method with baseline models. Additionally, we performed extensive qualitative analysis.

Result:

Results from qualitative and quantitative experiments indicate that our approach effectively enhances the performance of code summarization. Our method outperforms both the generation-based and the retrieval-enhanced baselines. Further ablation experiments demonstrate the effectiveness of each component of our method. Results from sensitivity analysis experiments suggest that our approach achieves good performance without requiring extensive hyper-parameter search.

Conclusion:

The direction of utilizing retrieval-enhanced generation tasks shows great potential. It is essential to increase the diversity of retrieval results during the training process, which is crucial for improving the generality and the performance of the model.

背景：代码摘要是指自动生成代码片段的自然语言描述以促进代码理解的任务。现有的方法通过将信息检索纳入生成式深度学习模型来重用现有代码摘要，取得了令人满意的效果。然而，这些现有方法大多采用不可学习的通用检索方法进行基于内容的检索，导致训练过程中检索结果缺乏多样性，从而使模型过度依赖检索结果，降低了生成模型对未知样本的泛化能力。方法：具体来说，我们使用一个跨模态检索模块，在训练过程中动态改变检索结果，从而增强检索结果的多样性，并保持两个任务之间的相对平衡。此外，在摘要生成阶段，我们采用交叉关注机制，根据检索到的摘要和生成的摘要之间的一致性生成代码摘要。我们在三个实际数据集上进行了实验，比较了我们的方法与基准模型的性能。结果：定性和定量实验的结果表明，我们的方法有效地提高了代码摘要的性能。我们的方法优于基于生成和检索增强的基线方法。进一步的消融实验证明了我们方法每个组成部分的有效性。敏感性分析实验的结果表明，我们的方法无需进行大量的超参数搜索即可实现良好的性能。在训练过程中，必须增加检索结果的多样性，这对提高模型的通用性和性能至关重要。

{"title":"Cross-Modal Retrieval-enhanced code Summarization based on joint learning for retrieval and generation","authors":"Lixuan Li , Bin Liang , Lin Chen , Xiaofang Zhang","doi":"10.1016/j.infsof.2024.107527","DOIUrl":"10.1016/j.infsof.2024.107527","url":null,"abstract":"<div><h3>Context:</h3><p>Code summarization refers to a task that automatically generates a natural language description of a code snippet to facilitate code comprehension. Existing methods have achieved satisfactory results by incorporating information retrieval into generative deep-learning models for reusing summaries of existing code. However, most of these existing methods employed non-learnable generic retrieval methods for content-based retrieval, resulting in a lack of diversity in the retrieved results during training, thereby making the model over-reliant on retrieved results and reducing the generative model’s ability to generalize to unknown samples.</p></div><div><h3>Objective:</h3><p>To address this issue, this paper introduces CMR-Sum: a novel Cross-Modal Retrieval-enhanced code Summarization framework based on joint learning for generation and retrieval tasks, where both two tasks are allowed to be optimized simultaneously.</p></div><div><h3>Method:</h3><p>Specifically, we use a cross-modal retrieval module to dynamically alter retrieval results during training, which enhances the diversity of the retrieved results and maintains a relative balance between the two tasks. Furthermore, in the summary generation phase, we employ a cross-attention mechanism to generate code summaries based on the alignment between retrieved and generated summaries. We conducted experiments on three real-world datasets, comparing the performance of our method with baseline models. Additionally, we performed extensive qualitative analysis.</p></div><div><h3>Result:</h3><p>Results from qualitative and quantitative experiments indicate that our approach effectively enhances the performance of code summarization. Our method outperforms both the generation-based and the retrieval-enhanced baselines. Further ablation experiments demonstrate the effectiveness of each component of our method. Results from sensitivity analysis experiments suggest that our approach achieves good performance without requiring extensive hyper-parameter search.</p></div><div><h3>Conclusion:</h3><p>The direction of utilizing retrieval-enhanced generation tasks shows great potential. It is essential to increase the diversity of retrieval results during the training process, which is crucial for improving the generality and the performance of the model.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"175 ","pages":"Article 107527"},"PeriodicalIF":3.8,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141851202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GitHub marketplace for automation and innovation in software production 促进软件生产自动化和创新的 GitHub 市场

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology

Pub Date : 2024-07-14 DOI: 10.1016/j.infsof.2024.107522

SK. Golam Saroar, Waseefa Ahmed, Elmira Onagh, Maleknaz Nayebi

Context:

GitHub, renowned for facilitating collaborative code version control and software production in software teams, expanded its services in 2017 by introducing GitHub Marketplace. This online platform hosts automation tools to assist developers with the production of their GitHub-hosted projects, and it has become a valuable source of information on the tools used in the Open Source Software (OSS) community.

Objective:

In this exploratory study, we introduce GitHub Marketplace as a software marketplace by exploring the Characteristics, Features, and Policies of the platform comprehensively, identifying common themes in production automation. Further, we explore popular tools among practitioners and researchers and highlight disparities in the approach to these tools between industry and academia.

Method:

We adopted the conceptual framework of software app stores from previous studies and used that to examine 8,318 automated production tools (440 Apps and 7,878 Actions) across 32 categories on GitHub Marketplace. We explored and described the policies of this marketplace as a unique platform where developers share production tools for the use of other developers. Furthermore, we conducted a systematic mapping of 515 research papers published from 2000 to 2021 and compared open-source academic production tools with those available in the marketplace.

Results:

We found that although some of the automation topics in literature are widely used in practice, they have yet to align with the state-of-practice for automated production. We discovered that practitioners often use automation tools for tasks like “Continuous Integration” and “Utilities”, while researchers tend to focus more on “Code Quality” and “Testing”.

Conclusion:

Our study illuminates the landscape of open-source tools for automation production. We also explored the disparities between industry trends and researchers’ priorities. Recognizing these distinctions can empower researchers to build on existing work and guide practitioners in selecting tools that meet their specific needs. Bridging this gap between industry and academia helps with further innovation in the field and ensures that research remains pertinent to the evolving challenges in software production.

背景：GitHub 以促进软件团队的协作代码版本控制和软件生产而闻名，2017 年推出了 GitHub Marketplace，扩大了服务范围。这个在线平台上的自动化工具可以帮助开发人员生产他们在 GitHub 上托管的项目，它已成为开源软件（OSS）社区使用的工具的宝贵信息来源。目标：在这项探索性研究中，我们通过全面探索该平台的特点、功能和政策来介绍作为软件市场的 GitHub Marketplace，找出生产自动化的共同主题。方法：我们采用了之前研究中的软件应用程序商店概念框架，并以此为基础，对 GitHub Marketplace 上 32 个类别中的 8318 个自动化生产工具（440 个应用程序和 7878 个操作）进行了研究。我们探索并描述了该市场的政策，它是开发者分享生产工具供其他开发者使用的独特平台。此外，我们还对 2000 年至 2021 年期间发表的 515 篇研究论文进行了系统性的映射，并将开源学术生产工具与市场上的生产工具进行了比较。结果：我们发现，尽管文献中的一些自动化主题在实践中得到了广泛应用，但它们尚未与自动化生产的实践状况相一致。我们发现，实践者通常将自动化工具用于 "持续集成 "和 "实用工具 "等任务，而研究人员则更倾向于关注 "代码质量 "和 "测试"。我们还探讨了行业趋势与研究人员优先事项之间的差异。认识到这些差异可以使研究人员在现有工作的基础上更进一步，并指导从业人员选择满足其特定需求的工具。缩小行业与学术界之间的差距有助于该领域的进一步创新，并确保研究工作始终与软件生产领域不断变化的挑战相关联。

{"title":"GitHub marketplace for automation and innovation in software production","authors":"SK. Golam Saroar, Waseefa Ahmed, Elmira Onagh, Maleknaz Nayebi","doi":"10.1016/j.infsof.2024.107522","DOIUrl":"10.1016/j.infsof.2024.107522","url":null,"abstract":"<div><h3>Context:</h3><p>GitHub, renowned for facilitating collaborative code version control and software production in software teams, expanded its services in 2017 by introducing GitHub Marketplace. This online platform hosts automation tools to assist developers with the production of their GitHub-hosted projects, and it has become a valuable source of information on the tools used in the Open Source Software (OSS) community.</p></div><div><h3>Objective:</h3><p>In this exploratory study, we introduce GitHub Marketplace as a software marketplace by exploring the Characteristics, Features, and Policies of the platform comprehensively, identifying common themes in production automation. Further, we explore popular tools among practitioners and researchers and highlight disparities in the approach to these tools between industry and academia.</p></div><div><h3>Method:</h3><p>We adopted the conceptual framework of software app stores from previous studies and used that to examine 8,318 automated production tools (440 Apps and 7,878 Actions) across 32 categories on GitHub Marketplace. We explored and described the policies of this marketplace as a unique platform where developers share production tools for the use of other developers. Furthermore, we conducted a systematic mapping of 515 research papers published from 2000 to 2021 and compared open-source academic production tools with those available in the marketplace.</p></div><div><h3>Results:</h3><p>We found that although some of the automation topics in literature are widely used in practice, they have yet to align with the state-of-practice for automated production. We discovered that practitioners often use automation tools for tasks like “Continuous Integration” and “Utilities”, while researchers tend to focus more on “Code Quality” and “Testing”.</p></div><div><h3>Conclusion:</h3><p>Our study illuminates the landscape of open-source tools for automation production. We also explored the disparities between industry trends and researchers’ priorities. Recognizing these distinctions can empower researchers to build on existing work and guide practitioners in selecting tools that meet their specific needs. Bridging this gap between industry and academia helps with further innovation in the field and ensures that research remains pertinent to the evolving challenges in software production.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"175 ","pages":"Article 107522"},"PeriodicalIF":3.8,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0950584924001277/pdfft?md5=a5f34bd55e7fac78519b3f67fd64a5c6&pid=1-s2.0-S0950584924001277-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141693361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fine-tuning and prompt engineering for large language models-based code review automation 基于大型语言模型的代码审查自动化的微调和提示工程

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology

Pub Date : 2024-07-11 DOI: 10.1016/j.infsof.2024.107523

Chanathip Pornprasit, Chakkrit Tantithamthavorn

Context:

The rapid evolution of Large Language Models (LLMs) has sparked significant interest in leveraging their capabilities for automating code review processes. Prior studies often focus on developing LLMs for code review automation, yet require expensive resources, which is infeasible for organizations with limited budgets and resources. Thus, fine-tuning and prompt engineering are the two common approaches to leveraging LLMs for code review automation.

Objective:

We aim to investigate the performance of LLMs-based code review automation based on two contexts, i.e., when LLMs are leveraged by fine-tuning and prompting. Fine-tuning involves training the model on a specific code review dataset, while prompting involves providing explicit instructions to guide the model’s generation process without requiring a specific code review dataset.

Methods:

We leverage model fine-tuning and inference techniques (i.e., zero-shot learning, few-shot learning and persona) on LLMs-based code review automation. In total, we investigate 12 variations of two LLMs-based code review automation (i.e., GPT-3.5 and Magicoder), and compare them with the Guo et al.’s approach and three existing code review automation approaches (i.e., CodeReviewer, TufanoT5 and D-ACT).

Results:

The fine-tuning of GPT 3.5 with zero-shot learning helps GPT-3.5 to achieve 73.17%–74.23% higher EM than the Guo et al.’s approach. In addition, when GPT-3.5 is not fine-tuned, GPT-3.5 with few-shot learning achieves 46.38%–659.09% higher EM than GPT-3.5 with zero-shot learning.

Conclusions:

Based on our results, we recommend that (1) LLMs for code review automation should be fine-tuned to achieve the highest performance.; and (2) when data is not sufficient for model fine-tuning (e.g., a cold-start problem), few-shot learning without a persona should be used for LLMs for code review automation. Our findings contribute valuable insights into the practical recommendations and trade-offs associated with deploying LLMs for code review automation.

背景：大语言模型（LLM）的快速发展引发了人们对利用其功能实现代码审查流程自动化的极大兴趣。之前的研究通常侧重于开发用于代码审查自动化的 LLM，但这需要昂贵的资源，对于预算和资源有限的组织来说是不可行的。因此，微调和提示工程是利用 LLMs 实现代码审查自动化的两种常用方法。目标：我们旨在研究基于 LLMs 的代码审查自动化在微调和提示两种情况下的性能。微调包括在特定的代码审查数据集上训练模型，而提示则包括提供明确的指令来指导模型的生成过程，而不需要特定的代码审查数据集。方法：我们在基于 LLMs 的代码审查自动化中利用了模型微调和推理技术（即零镜头学习、少镜头学习和角色）。我们总共研究了两种基于 LLMs 的代码审查自动化方法（即 GPT-3.5 和 Magicoder）的 12 种变体，并将它们与 Guo 等人的方法和三种现有的代码审查自动化方法（即 CodeReviewer、TufanoT5 和 D-ACT）进行了比较。结论：基于我们的研究结果，我们建议：（1）用于代码审查自动化的 LLM 应进行微调，以实现最高性能；（2）当数据不足以对模型进行微调时（例如，冷启动问题），应采用少次学习方法、冷启动问题）时，用于代码审查自动化的 LLM 应使用无角色的少量学习。我们的研究结果为代码审查自动化部署 LLM 相关的实用建议和权衡提供了宝贵的见解。

{"title":"Fine-tuning and prompt engineering for large language models-based code review automation","authors":"Chanathip Pornprasit, Chakkrit Tantithamthavorn","doi":"10.1016/j.infsof.2024.107523","DOIUrl":"10.1016/j.infsof.2024.107523","url":null,"abstract":"<div><h3>Context:</h3><p>The rapid evolution of Large Language Models (LLMs) has sparked significant interest in leveraging their capabilities for automating code review processes. Prior studies often focus on developing LLMs for code review automation, yet require expensive resources, which is infeasible for organizations with limited budgets and resources. Thus, fine-tuning and prompt engineering are the two common approaches to leveraging LLMs for code review automation.</p></div><div><h3>Objective:</h3><p>We aim to investigate the performance of LLMs-based code review automation based on two contexts, i.e., when LLMs are leveraged by fine-tuning and prompting. Fine-tuning involves training the model on a specific code review dataset, while prompting involves providing explicit instructions to guide the model’s generation process without requiring a specific code review dataset.</p></div><div><h3>Methods:</h3><p>We leverage model fine-tuning and inference techniques (i.e., zero-shot learning, few-shot learning and persona) on LLMs-based code review automation. In total, we investigate 12 variations of two LLMs-based code review automation (i.e., GPT-3.5 and Magicoder), and compare them with the Guo et al.’s approach and three existing code review automation approaches (i.e., CodeReviewer, TufanoT5 and D-ACT).</p></div><div><h3>Results:</h3><p>The fine-tuning of GPT 3.5 with zero-shot learning helps GPT-3.5 to achieve 73.17%–74.23% higher EM than the Guo et al.’s approach. In addition, when GPT-3.5 is not fine-tuned, GPT-3.5 with few-shot learning achieves 46.38%–659.09% higher EM than GPT-3.5 with zero-shot learning.</p></div><div><h3>Conclusions:</h3><p>Based on our results, we recommend that (1) LLMs for code review automation should be fine-tuned to achieve the highest performance.; and (2) when data is not sufficient for model fine-tuning (e.g., a cold-start problem), few-shot learning without a persona should be used for LLMs for code review automation. Our findings contribute valuable insights into the practical recommendations and trade-offs associated with deploying LLMs for code review automation.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"175 ","pages":"Article 107523"},"PeriodicalIF":3.8,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0950584924001289/pdfft?md5=526a4187620c208e9aedacd19f66db65&pid=1-s2.0-S0950584924001289-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141728781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0