首页 > 最新文献

IET Software最新文献

英文 中文
A Data-Driven Methodology for Quality Aware Code Fixing 质量意识代码修复的数据驱动方法
IF 1.3 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-06 DOI: 10.1049/sfw2/4147669
Thomas Karanikiotis, Andreas L. Symeonidis

In today’s rapidly changing software development landscape, ensuring code quality is essential to reliability, maintainability, and security among other aspects. Identifying code quality issues can be tackled; however, implementing code quality improvements can be a complex and time-consuming task. To address this problem, we present a novel methodology designed to assist developers by suggesting alternative code snippets that not only match the functionality of the original code but also improve its quality based on predefined metrics. Our system is based on a language-agnostic approach that allows the analysis of code snippets written in different programming languages. It employs advanced techniques to assess functional similarity and evaluates syntactic similarity, suggesting alternatives that minimize the need for extensive modification. The evaluation of our system on multiple axes demonstrates the effectiveness of our approach in providing usable code alternatives that are both functionally equivalent and syntactically similar to the original snippets, while significantly improving quality metrics. We argue that our methodology and tool can be valuable for the software engineering community, bridging the gap between the identification of code quality problems and the implementation of practical solutions that improve software quality.

在当今瞬息万变的软件开发环境中,确保代码质量对于可靠性、可维护性和安全性至关重要。确定代码质量问题可以解决;然而,实现代码质量改进可能是一项复杂且耗时的任务。为了解决这个问题,我们提出了一种新的方法,旨在通过建议替代代码片段来帮助开发人员,这些代码片段不仅与原始代码的功能相匹配,而且还基于预定义的度量来提高其质量。我们的系统基于与语言无关的方法,允许分析用不同编程语言编写的代码片段。它采用先进的技术来评估功能相似度和语法相似度,建议将大量修改的需要最小化的替代方案。我们的系统在多个轴上的评估证明了我们的方法在提供可用的代码替代方面的有效性,这些代码在功能上和语法上都与原始代码片段相似,同时显著提高了质量指标。我们认为,我们的方法和工具对于软件工程社区来说是有价值的,在代码质量问题的识别和改进软件质量的实际解决方案的实现之间架起了桥梁。
{"title":"A Data-Driven Methodology for Quality Aware Code Fixing","authors":"Thomas Karanikiotis,&nbsp;Andreas L. Symeonidis","doi":"10.1049/sfw2/4147669","DOIUrl":"10.1049/sfw2/4147669","url":null,"abstract":"<p>In today’s rapidly changing software development landscape, ensuring code quality is essential to reliability, maintainability, and security among other aspects. Identifying code quality issues can be tackled; however, implementing code quality improvements can be a complex and time-consuming task. To address this problem, we present a novel methodology designed to assist developers by suggesting alternative code snippets that not only match the functionality of the original code but also improve its quality based on predefined metrics. Our system is based on a language-agnostic approach that allows the analysis of code snippets written in different programming languages. It employs advanced techniques to assess functional similarity and evaluates syntactic similarity, suggesting alternatives that minimize the need for extensive modification. The evaluation of our system on multiple axes demonstrates the effectiveness of our approach in providing usable code alternatives that are both functionally equivalent and syntactically similar to the original snippets, while significantly improving quality metrics. We argue that our methodology and tool can be valuable for the software engineering community, bridging the gap between the identification of code quality problems and the implementation of practical solutions that improve software quality.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2025 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2/4147669","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143909443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Commit Classification Framework Incorporated With Prompt Tuning and External Knowledge 结合即时调优和外部知识的提交分类框架
IF 1.3 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-04-26 DOI: 10.1049/sfw2/5566134
Jiajun Tong, Xiaobin Rui

Commit classification is an important task in software maintenance, since it helps software developers classify code changes into different types according to their nature and purpose. This allows them to better understand how their development efforts are progressing, identify areas where they need improvement, and make informed decisions about when and how to release new versions of their software. However, existing methods are all discriminative models, usually with complex architectures that require additional output layers to produce class label probabilities, making them task-specific and unable to learn features across different tasks. Moreover, they require a large amount of labeled data for fine tuning, and it is difficult to learn effective classification boundaries in the case of limited labeled data. To solve the above problems, we propose a generative framework that incorporates prompt tuning for commit classification with external knowledge (IPCK), which simplifies the model structure and learns features across different tasks, only based on the commit message information as the input. First, we proposed a generative framework based on T5 (text-to-text transfer transformer). This encoder–decoder construction method unifies different commit classification tasks into a text-to-text problem, simplifying the model’s structure by not requiring an extra output layer. Second, instead of fine tuning, we design a prompt tuning solution that can be adopted in few-shot scenarios with only limited samples. Furthermore, we incorporate external knowledge via an external knowledge graph to map the probabilities of words into the final labels in the speech machine step to improve performance in few-shot scenarios. Extensive experiments on two open available datasets demonstrate that our framework can solve the commit classification problem simply but effectively for both single-label binary classification and single-label multiclass classification purposes with 90% and 83% accuracy. Further, in the few-shot scenarios, our method improves the adaptability of the model without requiring a large number of training samples for fine tuning.

提交分类是软件维护中的一项重要任务,因为它可以帮助软件开发人员根据代码更改的性质和目的将其分类为不同的类型。这使他们能够更好地了解他们的开发工作是如何进行的,确定他们需要改进的领域,并就何时以及如何发布软件的新版本做出明智的决定。然而,现有的方法都是判别模型,通常具有复杂的架构,需要额外的输出层来产生类标签概率,使它们特定于任务,无法跨不同任务学习特征。此外,它们需要大量的标记数据进行微调,并且在标记数据有限的情况下很难学习到有效的分类边界。为了解决上述问题,我们提出了一个生成框架,该框架结合了外部知识(IPCK)的提交分类提示调优,简化了模型结构,并仅基于提交消息信息作为输入,学习了不同任务之间的特征。首先,我们提出了一个基于T5(文本到文本传输转换器)的生成框架。这种编码器-解码器构造方法将不同的提交分类任务统一为文本到文本问题,通过不需要额外的输出层来简化模型的结构。其次,我们设计了一种可以在样本有限的情况下使用的快速调优方案,而不是微调。此外,我们通过外部知识图将外部知识整合到语音机器步骤中,将单词的概率映射到最终标签中,以提高在少数场景下的性能。在两个开放可用数据集上的大量实验表明,我们的框架可以简单有效地解决单标签二分类和单标签多分类的提交分类问题,准确率分别为90%和83%。此外,在少镜头场景下,我们的方法提高了模型的适应性,而不需要大量的训练样本进行微调。
{"title":"A Commit Classification Framework Incorporated With Prompt Tuning and External Knowledge","authors":"Jiajun Tong,&nbsp;Xiaobin Rui","doi":"10.1049/sfw2/5566134","DOIUrl":"10.1049/sfw2/5566134","url":null,"abstract":"<p>Commit classification is an important task in software maintenance, since it helps software developers classify code changes into different types according to their nature and purpose. This allows them to better understand how their development efforts are progressing, identify areas where they need improvement, and make informed decisions about when and how to release new versions of their software. However, existing methods are all discriminative models, usually with complex architectures that require additional output layers to produce class label probabilities, making them task-specific and unable to learn features across different tasks. Moreover, they require a large amount of labeled data for fine tuning, and it is difficult to learn effective classification boundaries in the case of limited labeled data. To solve the above problems, we propose a generative framework that incorporates prompt tuning for commit classification with external knowledge (IPCK), which simplifies the model structure and learns features across different tasks, only based on the commit message information as the input. First, we proposed a generative framework based on T5 (text-to-text transfer transformer). This encoder–decoder construction method unifies different commit classification tasks into a text-to-text problem, simplifying the model’s structure by not requiring an extra output layer. Second, instead of fine tuning, we design a prompt tuning solution that can be adopted in few-shot scenarios with only limited samples. Furthermore, we incorporate external knowledge via an external knowledge graph to map the probabilities of words into the final labels in the speech machine step to improve performance in few-shot scenarios. Extensive experiments on two open available datasets demonstrate that our framework can solve the commit classification problem simply but effectively for both single-label binary classification and single-label multiclass classification purposes with 90% and 83% accuracy. Further, in the few-shot scenarios, our method improves the adaptability of the model without requiring a large number of training samples for fine tuning.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2025 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2/5566134","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multisource Heterogeneous Data Fusion Methods Driven by Digital Twin on Basis of Prophet Algorithm 基于先知算法的数字孪生驱动多源异构数据融合方法
IF 1.3 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-04-22 DOI: 10.1049/sfw2/5041019
Min Li

With the development of intelligent manufacturing and the wider application of the Internet of Things (IoT), it is crucial to fuse heterogeneous sensor data from multiple sources. However, the current data fusion methods still have problems, such as low accuracy of fused data, insufficient data integrity, poor data fusion efficiency, and poor scalability of fusion methods. In response to these issues, this article explores a multisource heterogeneous data fusion method based on the Prophet algorithm digital twin drive to improve the fusion effect of sensor data and provide more support for subsequent decision-making. The article first used curve and sequence alignment to extract data features and then analyzed the trend of data changes using the Prophet algorithm. Afterward, this article constructed a digital twin model to provide analytical views and data services. In conclusion, this paper used tensor decomposition to merge text and image data from sensor data. Deep learning algorithms and Kalman filtering techniques were also examined to confirm the efficacy of data fusion under the Prophet algorithm. The experimental results showed that after fusing the data using the Prophet algorithm, the average accuracy can reach 92.63%, while the average resource utilization at this time was only 9.97%. The results showed that combining Prophet with digital twin technology can achieve higher accuracy, fusion efficiency, and better scalability. The research in this paper can provide new ideas and means for the fusion and analysis of heterogeneous data from multiple sources.

随着智能制造的发展和物联网(IoT)的广泛应用,融合多源异构传感器数据至关重要。然而,目前的数据融合方法仍然存在融合数据精度低、数据完整性不足、数据融合效率差、融合方法可扩展性差等问题。针对这些问题,本文探索了一种基于Prophet算法数字双驱动的多源异构数据融合方法,以提高传感器数据的融合效果,为后续决策提供更多支持。本文首先利用曲线和序列比对提取数据特征,然后利用Prophet算法分析数据变化趋势。随后,本文构建了一个数字孪生模型,以提供分析视图和数据服务。综上所述,本文采用张量分解对传感器数据中的文本和图像数据进行合并。还研究了深度学习算法和卡尔曼滤波技术,以证实Prophet算法下数据融合的有效性。实验结果表明,使用Prophet算法融合数据后,平均准确率可达92.63%,而此时的平均资源利用率仅为9.97%。结果表明,将Prophet与数字孪生技术相结合可以实现更高的精度、融合效率和更好的可扩展性。本文的研究为多源异构数据的融合与分析提供了新的思路和手段。
{"title":"Multisource Heterogeneous Data Fusion Methods Driven by Digital Twin on Basis of Prophet Algorithm","authors":"Min Li","doi":"10.1049/sfw2/5041019","DOIUrl":"10.1049/sfw2/5041019","url":null,"abstract":"<p>With the development of intelligent manufacturing and the wider application of the Internet of Things (IoT), it is crucial to fuse heterogeneous sensor data from multiple sources. However, the current data fusion methods still have problems, such as low accuracy of fused data, insufficient data integrity, poor data fusion efficiency, and poor scalability of fusion methods. In response to these issues, this article explores a multisource heterogeneous data fusion method based on the Prophet algorithm digital twin drive to improve the fusion effect of sensor data and provide more support for subsequent decision-making. The article first used curve and sequence alignment to extract data features and then analyzed the trend of data changes using the Prophet algorithm. Afterward, this article constructed a digital twin model to provide analytical views and data services. In conclusion, this paper used tensor decomposition to merge text and image data from sensor data. Deep learning algorithms and Kalman filtering techniques were also examined to confirm the efficacy of data fusion under the Prophet algorithm. The experimental results showed that after fusing the data using the Prophet algorithm, the average accuracy can reach 92.63%, while the average resource utilization at this time was only 9.97%. The results showed that combining Prophet with digital twin technology can achieve higher accuracy, fusion efficiency, and better scalability. The research in this paper can provide new ideas and means for the fusion and analysis of heterogeneous data from multiple sources.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2025 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2/5041019","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143861545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Systematic Literature Review on Graphical User Interface Testing Through Software Patterns 通过软件模式测试图形用户界面的系统性文献综述
IF 1.3 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-04-12 DOI: 10.1049/sfw2/9140693
Ambreen Kousar, Saif Ur Rehman Khan, Atif Mashkoor, Javed Iqbal

Context: Graphical user interface (GUI) testing of mobile applications (apps) is significant from a user perspective to ensure that the apps are visually appealing and user-friendly. Pattern-based GUI testing (PBGT) is an innovative model-based testing (MBT) approach designed to enhance user satisfaction and reusability while minimizing the effort required to model and test UIs of mobile apps. In the literature, several primary studies have been conducted in the domain of PBGT.

Problem: The current state-of-the-art lacks comprehensive secondary studies within the PBGT domain. To our knowledge, this area has insufficient focus on in-depth research. Consequently, numerous challenges and limitations persist in the existing literature.

Objective: This study aims to fill the gaps mentioned above in the existing body of knowledge. We highlight popular research topics and analyze their relationships. We explore current state-of-the-art approaches and techniques, a taxonomy of tools and modeling languages, a list of reported UI test patterns (UITPs), and a taxonomy of writing UITPs. We also highlight practical challenges, limitations, and gaps in the targeted research area. Furthermore, the current study intends to highlight future research directions in this domain.

Method: We conducted a systematic literature review (SLR) on PBGT in the context of Android and web apps. A hybrid methodology that combines the Kitchenham and PRISMA guidelines is adopted to achieve the targeted research objectives (ROs). We perform a keyword-based search on well-known databases and select 30 (out of 557) studies.

Results: The current study identifies 11 tools used in PBGT and devises a taxonomy to categorize these tools. A taxonomy for writing UITPs has also been developed. In addition, we outline the limitations of the targeted research domain and future directions.

Conclusion: This study benefits the community and readers by better understanding the targeted research area. A comprehensive knowledge of existing tools, techniques, and methodologies is helpful for practitioners. Moreover, the identified limitations, gaps, emerging trends, and future research directions will benefit researchers who intend to work further in future research.

背景:从用户角度来看,移动应用程序(Apps)的图形用户界面(GUI)测试对于确保应用程序的视觉吸引力和用户友好性意义重大。基于模式的图形用户界面测试(PBGT)是一种创新的基于模型的测试(MBT)方法,旨在提高用户满意度和可重用性,同时最大限度地减少移动应用程序用户界面建模和测试所需的工作量。文献中对 PBGT 领域进行了多项初步研究。 问题:目前在 PBGT 领域缺乏全面的二次研究。据我们所知,这一领域的深入研究还不够集中。因此,现有文献中仍然存在许多挑战和局限性。 研究目的本研究旨在填补现有知识体系中的上述空白。我们强调了热门研究课题,并分析了它们之间的关系。我们探讨了当前最先进的方法和技术、工具和建模语言分类法、已报告的用户界面测试模式(UITPs)列表以及编写 UITPs 的分类法。我们还强调了目标研究领域的实际挑战、局限性和差距。此外,本研究还旨在强调该领域未来的研究方向。 研究方法我们对安卓和网络应用中的 PBGT 进行了系统的文献综述(SLR)。我们采用了一种结合 Kitchenham 和 PRISMA 准则的混合方法来实现目标研究目标 (RO)。我们在知名数据库中进行了基于关键词的搜索,并从 557 项研究中选出了 30 项。 结果:本研究确定了 PBGT 中使用的 11 种工具,并设计了一种分类法对这些工具进行分类。此外,还制定了用于编写 UITP 的分类标准。此外,我们还概述了目标研究领域的局限性和未来发展方向。 结论本研究通过更好地了解目标研究领域,使社区和读者受益匪浅。对现有工具、技术和方法的全面了解对从业人员很有帮助。此外,已确定的局限性、差距、新兴趋势和未来研究方向也将使打算在未来研究中进一步开展工作的研究人员受益匪浅。
{"title":"A Systematic Literature Review on Graphical User Interface Testing Through Software Patterns","authors":"Ambreen Kousar,&nbsp;Saif Ur Rehman Khan,&nbsp;Atif Mashkoor,&nbsp;Javed Iqbal","doi":"10.1049/sfw2/9140693","DOIUrl":"10.1049/sfw2/9140693","url":null,"abstract":"<p><b>Context:</b> Graphical user interface (GUI) testing of mobile applications (apps) is significant from a user perspective to ensure that the apps are visually appealing and user-friendly. Pattern-based GUI testing (PBGT) is an innovative model-based testing (MBT) approach designed to enhance user satisfaction and reusability while minimizing the effort required to model and test UIs of mobile apps. In the literature, several primary studies have been conducted in the domain of PBGT.</p><p><b>Problem:</b> The current state-of-the-art lacks comprehensive secondary studies within the PBGT domain. To our knowledge, this area has insufficient focus on in-depth research. Consequently, numerous challenges and limitations persist in the existing literature.</p><p><b>Objective:</b> This study aims to fill the gaps mentioned above in the existing body of knowledge. We highlight popular research topics and analyze their relationships. We explore current state-of-the-art approaches and techniques, a taxonomy of tools and modeling languages, a list of reported UI test patterns (UITPs), and a taxonomy of writing UITPs. We also highlight practical challenges, limitations, and gaps in the targeted research area. Furthermore, the current study intends to highlight future research directions in this domain.</p><p><b>Method:</b> We conducted a systematic literature review (SLR) on PBGT in the context of Android and web apps. A hybrid methodology that combines the Kitchenham and PRISMA guidelines is adopted to achieve the targeted research objectives (ROs). We perform a keyword-based search on well-known databases and select 30 (out of 557) studies.</p><p><b>Results:</b> The current study identifies 11 tools used in PBGT and devises a taxonomy to categorize these tools. A taxonomy for writing UITPs has also been developed. In addition, we outline the limitations of the targeted research domain and future directions.</p><p><b>Conclusion:</b> This study benefits the community and readers by better understanding the targeted research area. A comprehensive knowledge of existing tools, techniques, and methodologies is helpful for practitioners. Moreover, the identified limitations, gaps, emerging trends, and future research directions will benefit researchers who intend to work further in future research.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2025 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2/9140693","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143822296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated Hybrid Methodology for Software Architecture Style Selection Using Analytic Hierarchy Process and Fuzzy Analytic Hierarchy Process 基于层次分析法和模糊层次分析法的软件体系结构风格选择自动化混合方法
IF 1.3 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-04-03 DOI: 10.1049/sfw2/9943825
Muna Alrazgan, Ahmed Ghoneim, Luluah Albesher, Razan Aldossari, Shahad Alotaibi, Lama Alsaykhan, Norah Alshahrani, Maha Alshammari

In software engineering, selecting the appropriate architectural style for software systems is risky and sensitive. The selection process is a multicriteria decision-making (MCDM) problem. Consequently, selecting a suitable architecture is a key challenge in software development. This study presents an automated hybrid methodology based on the analytic hierarchy process (AHP) and fuzzy analytic hierarchy process (FAHP) to evaluate and suggest multiple architectural styles based on quality attributes (QAs) alone rather than relying on expert opinions. A Tera-PROMISE dataset is presented to illustrate the proposed methodology and then compare the result of the methodology with expert judgments. Moreover, to support the proposed methodology, a case study is carried out to compare the proposed method to previous studies.

在软件工程中,为软件系统选择合适的体系结构风格是有风险和敏感的。选择过程是一个多标准决策(MCDM)问题。因此,选择合适的体系结构是软件开发中的一个关键挑战。本文提出了一种基于层次分析法(AHP)和模糊层次分析法(FAHP)的自动化混合方法,以评估和建议基于质量属性(qa)的多种建筑风格,而不是依赖于专家意见。提出了一个Tera-PROMISE数据集来说明所提出的方法,然后将方法的结果与专家判断进行比较。此外,为了支持所提出的方法,进行了一个案例研究,以比较所提出的方法与以往的研究。
{"title":"Automated Hybrid Methodology for Software Architecture Style Selection Using Analytic Hierarchy Process and Fuzzy Analytic Hierarchy Process","authors":"Muna Alrazgan,&nbsp;Ahmed Ghoneim,&nbsp;Luluah Albesher,&nbsp;Razan Aldossari,&nbsp;Shahad Alotaibi,&nbsp;Lama Alsaykhan,&nbsp;Norah Alshahrani,&nbsp;Maha Alshammari","doi":"10.1049/sfw2/9943825","DOIUrl":"10.1049/sfw2/9943825","url":null,"abstract":"<p>In software engineering, selecting the appropriate architectural style for software systems is risky and sensitive. The selection process is a multicriteria decision-making (MCDM) problem. Consequently, selecting a suitable architecture is a key challenge in software development. This study presents an automated hybrid methodology based on the analytic hierarchy process (AHP) and fuzzy analytic hierarchy process (FAHP) to evaluate and suggest multiple architectural styles based on quality attributes (QAs) alone rather than relying on expert opinions. A Tera-PROMISE dataset is presented to illustrate the proposed methodology and then compare the result of the methodology with expert judgments. Moreover, to support the proposed methodology, a case study is carried out to compare the proposed method to previous studies.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2025 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2/9943825","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143770403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Blockchain Consensus Scheme Based on the Proof of Distributed Deep Learning Work 基于分布式深度学习工作证明的区块链共识方案
IF 1.3 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-01-21 DOI: 10.1049/sfw2/3378383
Hui Zhi, HongCheng Wu, Yu Huang, ChangLin Tian, SuZhen Wang

With the development of artificial intelligence and blockchain technology, the training of deep learning models needs large computing resources. Meanwhile, the Proof of Work (PoW) consensus mechanism in blockchain systems often leads to the wastage of computing resources. This article combines distributed deep learning (DDL) with blockchain technology and proposes a blockchain consensus scheme based on the proof of distributed deep learning work (BCDDL) to reduce the waste of computing resources in blockchain. BCDDL treats DDL training as a mining task and allocates different training data to different nodes based on their computing power to improve the utilization rate of computing resources. In order to balance the demand and supply of computing resources and incentivize nodes to participate in training tasks and consensus, a dynamic incentive mechanism based on task size and computing resources (DIM-TSCR) is proposed. In addition, in order to reduce the impact of malicious nodes on the accuracy of the global model, a model aggregation algorithm based on training data size and model accuracy (MAA-TM) is designed. Experiments demonstrate that BCDDL can significantly increase the utilization rate of computing resources and diminish the impact of malicious nodes on the accuracy of the global model.

随着人工智能和区块链技术的发展,深度学习模型的训练需要大量的计算资源。同时,区块链系统中的工作量证明(PoW)共识机制经常导致计算资源的浪费。本文将分布式深度学习(DDL)与区块链技术相结合,提出了一种基于分布式深度学习工作证明(BCDDL)的区块链共识方案,以减少区块链中计算资源的浪费。BCDDL将DDL训练视为一项挖掘任务,根据不同节点的计算能力将不同的训练数据分配给不同的节点,以提高计算资源的利用率。为了平衡计算资源的需求和供给,激励节点参与训练任务和共识,提出了一种基于任务大小和计算资源的动态激励机制(DIM-TSCR)。此外,为了减少恶意节点对全局模型精度的影响,设计了一种基于训练数据大小和模型精度的模型聚合算法(MAA-TM)。实验表明,BCDDL可以显著提高计算资源的利用率,减少恶意节点对全局模型精度的影响。
{"title":"Blockchain Consensus Scheme Based on the Proof of Distributed Deep Learning Work","authors":"Hui Zhi,&nbsp;HongCheng Wu,&nbsp;Yu Huang,&nbsp;ChangLin Tian,&nbsp;SuZhen Wang","doi":"10.1049/sfw2/3378383","DOIUrl":"10.1049/sfw2/3378383","url":null,"abstract":"<p>With the development of artificial intelligence and blockchain technology, the training of deep learning models needs large computing resources. Meanwhile, the Proof of Work (PoW) consensus mechanism in blockchain systems often leads to the wastage of computing resources. This article combines distributed deep learning (DDL) with blockchain technology and proposes a blockchain consensus scheme based on the proof of distributed deep learning work (BCDDL) to reduce the waste of computing resources in blockchain. BCDDL treats DDL training as a mining task and allocates different training data to different nodes based on their computing power to improve the utilization rate of computing resources. In order to balance the demand and supply of computing resources and incentivize nodes to participate in training tasks and consensus, a dynamic incentive mechanism based on task size and computing resources (DIM-TSCR) is proposed. In addition, in order to reduce the impact of malicious nodes on the accuracy of the global model, a model aggregation algorithm based on training data size and model accuracy (MAA-TM) is designed. Experiments demonstrate that BCDDL can significantly increase the utilization rate of computing resources and diminish the impact of malicious nodes on the accuracy of the global model.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2025 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2/3378383","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143117532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Code Parameter Summarization Based on Transformer and Fusion Strategy 基于变压器和融合策略的代码参数汇总
IF 1.3 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-12-31 DOI: 10.1049/sfw2/3706673
Fanlong Zhang, Jiancheng Fan, Weiqi Li, Siau-cheng Khoo

Context: As more time has been spent on code comprehension activities during software development, automatic code summarization has received much attention in software engineering research, with the goal of enhancing software comprehensibility. In the meantime, it is prevalently known that a good knowledge about the declaration and the use of method parameters can effectively enhance the understanding of the associated methods. A traditional approach used in software development is to declare the types of method parameters.

Objective: In this work, we advocate parameter-level code summarization and propose a novel approach to automatically generate parameter summaries of a given method. Parameter summarization is considerably challenging, as neither do we know the kind of information of the parameters that can be employed for summarization nor do we know the methods for retrieving such information.

Method: We present paramTrans, which is a novel approach for parameter summarization. paramTrans characterizes the semantic features from parameter-related information based on transformer; it also explores three fusion strategies for absorbing the method-level information to enhance the performance. Moreover, to retrieve parameter-related information, a parameter slicing algorithm (named paramSlice) is proposed, which slices the parameter-related node from the abstract syntax tree (AST) at the statement level.

Results: We conducted experiments to verify the effectiveness of our approach. Experimental results show that our approach possesses an effective ability in summarizing parameters; such ability can be further enhanced by understanding the available summaries about individual methods, through the introduction of three fusion strategies.

Conclusion: We recommend developers employ our approach as well as the fusion strategies to produce parameter summaries to enhance the comprehensibility of code.

背景:随着软件开发过程中代码理解活动所花费的时间越来越多,以提高软件的可理解性为目标的自动代码总结在软件工程研究中受到越来越多的关注。与此同时,人们普遍知道,对方法参数的声明和使用有很好的了解,可以有效地增强对相关方法的理解。软件开发中使用的传统方法是声明方法参数的类型。目的:在这项工作中,我们提倡参数级代码摘要,并提出了一种自动生成给定方法的参数摘要的新方法。参数汇总是相当具有挑战性的,因为我们既不知道可以用于汇总的参数信息的类型,也不知道检索这些信息的方法。方法:提出了一种新的参数汇总方法——参数转换。paramTrans基于transformer从参数相关信息中提取语义特征;探讨了吸收方法级信息以提高性能的三种融合策略。此外,为了检索参数相关信息,提出了一种参数切片算法(paramSlice),该算法在语句级对抽象语法树(AST)中的参数相关节点进行切片。结果:我们通过实验验证了我们的方法的有效性。实验结果表明,该方法具有有效的参数汇总能力;通过介绍三种融合策略,可以通过了解单个方法的可用摘要来进一步增强这种能力。结论:我们建议开发人员采用我们的方法以及融合策略来生成参数摘要,以增强代码的可理解性。
{"title":"Code Parameter Summarization Based on Transformer and Fusion Strategy","authors":"Fanlong Zhang,&nbsp;Jiancheng Fan,&nbsp;Weiqi Li,&nbsp;Siau-cheng Khoo","doi":"10.1049/sfw2/3706673","DOIUrl":"10.1049/sfw2/3706673","url":null,"abstract":"<p><b>Context:</b> As more time has been spent on code comprehension activities during software development, automatic code summarization has received much attention in software engineering research, with the goal of enhancing software comprehensibility. In the meantime, it is prevalently known that a good knowledge about the declaration and the use of method parameters can effectively enhance the understanding of the associated methods. A traditional approach used in software development is to declare the types of method parameters.</p><p><b>Objective:</b> In this work, we advocate parameter-level code summarization and propose a novel approach to automatically generate parameter summaries of a given method. Parameter summarization is considerably challenging, as neither do we know the kind of information of the parameters that can be employed for summarization nor do we know the methods for retrieving such information.</p><p><b>Method:</b> We present paramTrans, which is a novel approach for parameter summarization. paramTrans characterizes the semantic features from parameter-related information based on transformer; it also explores three fusion strategies for absorbing the method-level information to enhance the performance. Moreover, to retrieve parameter-related information, a parameter slicing algorithm (named paramSlice) is proposed, which slices the parameter-related node from the abstract syntax tree (AST) at the statement level.</p><p><b>Results:</b> We conducted experiments to verify the effectiveness of our approach. Experimental results show that our approach possesses an effective ability in summarizing parameters; such ability can be further enhanced by understanding the available summaries about individual methods, through the introduction of three fusion strategies.</p><p><b>Conclusion:</b> We recommend developers employ our approach as well as the fusion strategies to produce parameter summaries to enhance the comprehensibility of code.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2/3706673","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143121177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Observational Study on Flask Web Framework Questions on Stack Overflow (SO) 关于Flask Web框架Stack Overflow (SO)问题的观察研究
IF 1.3 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-12-19 DOI: 10.1049/sfw2/1905538
Luluh Albesher, Reem Alfayez

Web-based applications are popular in demand and usage. To facilitate the development of web-based applications, the software engineering community developed multiple web application frameworks, one of which is Flask. Flask is a popular web framework that allows developers to speed up and scale the development of web applications. A review of the software engineering literature revealed that the Stack Overflow (SO) website has proven its effectiveness in providing a better understanding of multiple subjects within the software engineering field. This study aims to analyze SO Flask-related questions to gain a better understanding of the stance of Flask on the website. We identified a set of 70,230 Flask-related questions that we further analyzed to estimate how the interest towards the framework evolved over time on the website. Afterward, we utilized the Latent Dirichlet Allocation (LDA) algorithm to identify Flask-related topics that are discussed within the set of the identified questions. Moreover, we leveraged a number of proxy measures to examine the difficulty and popularity of the identified topics. The study found that the interest towards Flask has been generally increasing on the website, with a peak in 2020 and drops in the following years. Moreover, Flask-related questions on SO revolve around 12 topics, where Application Programming Interface (API) can be considered the most popular topic and background tasks can be considered the most difficult one. Software engineering researchers, practitioners, educators, and Flask contributors may find this study useful in guiding their future Flask-related endeavors.

基于web的应用程序在需求和使用方面都很流行。为了促进基于web的应用程序的开发,软件工程社区开发了多个web应用程序框架,其中之一就是Flask。Flask是一个流行的web框架,它允许开发人员加速和扩展web应用程序的开发。对软件工程文献的回顾表明,Stack Overflow (SO)网站已经证明了它在提供对软件工程领域内多个主题的更好理解方面的有效性。本研究旨在分析与SO Flask相关的问题,以更好地了解Flask在网站上的立场。我们确定了70,230个与flask相关的问题,我们进一步分析了这些问题,以估计网站上对该框架的兴趣是如何随着时间的推移而演变的。之后,我们利用潜在狄利克雷分配(LDA)算法来识别在识别问题集中讨论的flask相关主题。此外,我们利用一些代理措施来检查确定主题的难度和受欢迎程度。研究发现,网站上对Flask的兴趣总体上在增加,在2020年达到顶峰,随后几年下降。此外,与flask相关的SO问题围绕着12个主题,其中应用程序编程接口(API)可以被认为是最受欢迎的主题,后台任务可以被认为是最难的主题。软件工程研究人员、实践者、教育者和Flask贡献者可能会发现这项研究对指导他们未来与Flask相关的工作很有用。
{"title":"An Observational Study on Flask Web Framework Questions on Stack Overflow (SO)","authors":"Luluh Albesher,&nbsp;Reem Alfayez","doi":"10.1049/sfw2/1905538","DOIUrl":"10.1049/sfw2/1905538","url":null,"abstract":"<p>Web-based applications are popular in demand and usage. To facilitate the development of web-based applications, the software engineering community developed multiple web application frameworks, one of which is Flask. Flask is a popular web framework that allows developers to speed up and scale the development of web applications. A review of the software engineering literature revealed that the Stack Overflow (SO) website has proven its effectiveness in providing a better understanding of multiple subjects within the software engineering field. This study aims to analyze SO Flask-related questions to gain a better understanding of the stance of Flask on the website. We identified a set of 70,230 Flask-related questions that we further analyzed to estimate how the interest towards the framework evolved over time on the website. Afterward, we utilized the Latent Dirichlet Allocation (LDA) algorithm to identify Flask-related topics that are discussed within the set of the identified questions. Moreover, we leveraged a number of proxy measures to examine the difficulty and popularity of the identified topics. The study found that the interest towards Flask has been generally increasing on the website, with a peak in 2020 and drops in the following years. Moreover, Flask-related questions on SO revolve around 12 topics, where Application Programming Interface (API) can be considered the most popular topic and background tasks can be considered the most difficult one. Software engineering researchers, practitioners, educators, and Flask contributors may find this study useful in guiding their future Flask-related endeavors.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2/1905538","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142851455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Software Defect Prediction Method Based on Clustering Ensemble Learning 基于聚类集合学习的软件缺陷预测方法
IF 1.3 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-11-19 DOI: 10.1049/2024/6294422
Hongwei Tao, Qiaoling Cao, Haoran Chen, Yanting Li, Xiaoxu Niu, Tao Wang, Zhenhao Geng, Songtao Shang

The technique of software defect prediction aims to assess and predict potential defects in software projects and has made significant progress in recent years within software development. In previous studies, this technique largely relied on supervised learning methods, requiring a substantial amount of labeled historical defect data to train the models. However, obtaining these labeled data often demands significant time and resources. In contrast, software defect prediction based on unsupervised learning does not depend on known labeled data, eliminating the need for large-scale data labeling, thereby saving considerable time and resources while providing a more flexible solution for ensuring software quality. This paper conducts software defect prediction using unsupervised learning methods on data from 16 projects across two public datasets (PROMISE and NASA). During the feature selection step, a chi-squared sparse feature selection method is proposed. This feature selection strategy combines chi-squared tests with sparse principal component analysis (SPCA). Specifically, the chi-squared test is first used to filter out the most statistically significant features, and then the SPCA is applied to reduce the dimensionality of these significant features. In the clustering step, the dot product matrix and Pearson correlation coefficient (PCC) matrix are used to construct weighted adjacency matrices, and a clustering overlap method is proposed. This method integrates spectral clustering, Newman clustering, fluid clustering, and Clauset–Newman–Moore (CNM) clustering through ensemble learning. Experimental results indicate that, in the absence of labeled data, using the chi-squared sparse method for feature selection demonstrates superior performance, and the proposed clustering overlap method outperforms or is comparable to the effectiveness of the four baseline clustering methods.

软件缺陷预测技术旨在评估和预测软件项目中的潜在缺陷,近年来在软件开发领域取得了重大进展。在以往的研究中,该技术主要依赖于监督学习方法,需要大量标注的历史缺陷数据来训练模型。然而,获取这些标注数据往往需要大量的时间和资源。相比之下,基于无监督学习的软件缺陷预测不依赖于已知的标记数据,无需进行大规模的数据标记,从而节省了大量的时间和资源,同时为确保软件质量提供了更灵活的解决方案。本文使用无监督学习方法对两个公共数据集(PROMISE 和 NASA)中 16 个项目的数据进行了软件缺陷预测。在特征选择步骤中,提出了一种奇平方稀疏特征选择方法。这种特征选择策略结合了卡方检验和稀疏主成分分析(SPCA)。具体来说,首先使用卡方检验筛选出统计意义最显著的特征,然后应用 SPCA 降低这些显著特征的维度。在聚类步骤中,利用点积矩阵和皮尔逊相关系数(PCC)矩阵构建加权邻接矩阵,并提出一种聚类重叠方法。该方法通过集合学习将光谱聚类、纽曼聚类、流体聚类和克劳塞特-纽曼-摩尔(CNM)聚类整合在一起。实验结果表明,在没有标注数据的情况下,使用秩方稀疏法进行特征选择表现出更优越的性能,而所提出的聚类重叠方法则优于或相当于四种基线聚类方法的效果。
{"title":"Software Defect Prediction Method Based on Clustering Ensemble Learning","authors":"Hongwei Tao,&nbsp;Qiaoling Cao,&nbsp;Haoran Chen,&nbsp;Yanting Li,&nbsp;Xiaoxu Niu,&nbsp;Tao Wang,&nbsp;Zhenhao Geng,&nbsp;Songtao Shang","doi":"10.1049/2024/6294422","DOIUrl":"10.1049/2024/6294422","url":null,"abstract":"<p>The technique of software defect prediction aims to assess and predict potential defects in software projects and has made significant progress in recent years within software development. In previous studies, this technique largely relied on supervised learning methods, requiring a substantial amount of labeled historical defect data to train the models. However, obtaining these labeled data often demands significant time and resources. In contrast, software defect prediction based on unsupervised learning does not depend on known labeled data, eliminating the need for large-scale data labeling, thereby saving considerable time and resources while providing a more flexible solution for ensuring software quality. This paper conducts software defect prediction using unsupervised learning methods on data from 16 projects across two public datasets (PROMISE and NASA). During the feature selection step, a chi-squared sparse feature selection method is proposed. This feature selection strategy combines chi-squared tests with sparse principal component analysis (SPCA). Specifically, the chi-squared test is first used to filter out the most statistically significant features, and then the SPCA is applied to reduce the dimensionality of these significant features. In the clustering step, the dot product matrix and Pearson correlation coefficient (PCC) matrix are used to construct weighted adjacency matrices, and a clustering overlap method is proposed. This method integrates spectral clustering, Newman clustering, fluid clustering, and Clauset–Newman–Moore (CNM) clustering through ensemble learning. Experimental results indicate that, in the absence of labeled data, using the chi-squared sparse method for feature selection demonstrates superior performance, and the proposed clustering overlap method outperforms or is comparable to the effectiveness of the four baseline clustering methods.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/6294422","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142674173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ConCPDP: A Cross-Project Defect Prediction Method Integrating Contrastive Pretraining and Category Boundary Adjustment ConCPDP:整合对比预训练和类别边界调整的跨项目缺陷预测方法
IF 1.3 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-11-13 DOI: 10.1049/2024/5102699
Hengjie Song, Yufei Pan, Feng Guo, Xue Zhang, Le Ma, Siyu Jiang

Software defect prediction (SDP) is a crucial phase preceding the launch of software products. Cross-project defect prediction (CPDP) is introduced for the anticipation of defects in novel projects lacking defect labels. CPDP can use defect information of mature projects to speed up defect prediction for new projects. So that developers can quickly get the defect information of the new project, so that they can test the software project pertinently. At present, the predominant approaches in CPDP rely on deep learning, and the performance of the ultimate model is notably affected by the quality of the training dataset. However, the dataset of CPDP not only has few samples but also has almost no label information in new projects, which makes the general deep-learning-based CPDP model not ideal. In addition, most of the current CPDP models do not fully consider the enrichment of classification boundary samples after cross-domain, leading to suboptimal predictive capabilities of the model. To overcome these obstacles, we present contrastive learning pretraining for CPDP (ConCPDP), a CPDP method integrating contrastive pretraining and category boundary adjustment. We first perform data augmentation on the source and target domain code files and then extract the enhanced data as an abstract syntax tree (AST). The AST is then transformed into an integer sequence using specific mapping rules, serving as input for the subsequent neural network. A neural network based on bidirectional long short-term memory (Bi-LSTM) will receive an integer sequence and output a feature vector. Then, the feature vectors are input into the contrastive module to optimise the feature extraction network. The pretrained feature extractor can be fine-tuned by the maximum mean discrepancy (MMD) between the feature distribution of the source domain and the target domain and the binary classification loss on the source domain. This paper conducts a large number of experiments on the PROMISE dataset, which is commonly used for CPDP, to validate ConCPDP’s efficacy, achieving superior results in terms of F1 measure, area under curve (AUC), and Matthew’s correlation coefficient (MCC).

软件缺陷预测(SDP)是软件产品发布前的一个关键阶段。跨项目缺陷预测(CPDP)是为预测缺乏缺陷标签的新项目中的缺陷而引入的。CPDP 可以利用成熟项目的缺陷信息来加快新项目的缺陷预测。这样,开发人员就能快速获得新项目的缺陷信息,从而有针对性地测试软件项目。目前,CPDP 的主要方法依赖于深度学习,而最终模型的性能明显受到训练数据集质量的影响。然而,CPDP 的数据集不仅样本少,而且新项目几乎没有标签信息,这使得基于深度学习的一般 CPDP 模型并不理想。此外,目前的 CPDP 模型大多没有充分考虑跨域后分类边界样本的丰富性,导致模型的预测能力不理想。为了克服这些障碍,我们提出了 CPDP 的对比学习预训练(ConCPDP),这是一种整合了对比预训练和类别边界调整的 CPDP 方法。我们首先对源代码文件和目标领域代码文件进行数据增强,然后将增强后的数据提取为抽象语法树(AST)。然后使用特定的映射规则将 AST 转换为整数序列,作为后续神经网络的输入。基于双向长短期记忆(Bi-LSTM)的神经网络将接收整数序列并输出特征向量。然后,将特征向量输入对比模块,以优化特征提取网络。预训练的特征提取器可根据源域和目标域特征分布之间的最大平均差异(MMD)以及源域的二元分类损失进行微调。本文在 CPDP 常用的 PROMISE 数据集上进行了大量实验,验证了 ConCPDP 的有效性,在 F1 指标、曲线下面积(AUC)和马太相关系数(MCC)方面取得了优异的结果。
{"title":"ConCPDP: A Cross-Project Defect Prediction Method Integrating Contrastive Pretraining and Category Boundary Adjustment","authors":"Hengjie Song,&nbsp;Yufei Pan,&nbsp;Feng Guo,&nbsp;Xue Zhang,&nbsp;Le Ma,&nbsp;Siyu Jiang","doi":"10.1049/2024/5102699","DOIUrl":"10.1049/2024/5102699","url":null,"abstract":"<p>Software defect prediction (SDP) is a crucial phase preceding the launch of software products. Cross-project defect prediction (CPDP) is introduced for the anticipation of defects in novel projects lacking defect labels. CPDP can use defect information of mature projects to speed up defect prediction for new projects. So that developers can quickly get the defect information of the new project, so that they can test the software project pertinently. At present, the predominant approaches in CPDP rely on deep learning, and the performance of the ultimate model is notably affected by the quality of the training dataset. However, the dataset of CPDP not only has few samples but also has almost no label information in new projects, which makes the general deep-learning-based CPDP model not ideal. In addition, most of the current CPDP models do not fully consider the enrichment of classification boundary samples after cross-domain, leading to suboptimal predictive capabilities of the model. To overcome these obstacles, we present contrastive learning pretraining for CPDP (ConCPDP), a CPDP method integrating contrastive pretraining and category boundary adjustment. We first perform data augmentation on the source and target domain code files and then extract the enhanced data as an abstract syntax tree (AST). The AST is then transformed into an integer sequence using specific mapping rules, serving as input for the subsequent neural network. A neural network based on bidirectional long short-term memory (Bi-LSTM) will receive an integer sequence and output a feature vector. Then, the feature vectors are input into the contrastive module to optimise the feature extraction network. The pretrained feature extractor can be fine-tuned by the maximum mean discrepancy (MMD) between the feature distribution of the source domain and the target domain and the binary classification loss on the source domain. This paper conducts a large number of experiments on the PROMISE dataset, which is commonly used for CPDP, to validate ConCPDP’s efficacy, achieving superior results in terms of <i>F</i><sub>1</sub> measure, area under curve (AUC), and Matthew’s correlation coefficient (MCC).</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/5102699","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142641693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IET Software
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1