IET Software最新文献

Blockchain Consensus Scheme Based on the Proof of Distributed Deep Learning Work

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IET Software

Pub Date : 2025-01-21 DOI: 10.1049/sfw2/3378383

Hui Zhi, HongCheng Wu, Yu Huang, ChangLin Tian, SuZhen Wang

With the development of artificial intelligence and blockchain technology, the training of deep learning models needs large computing resources. Meanwhile, the Proof of Work (PoW) consensus mechanism in blockchain systems often leads to the wastage of computing resources. This article combines distributed deep learning (DDL) with blockchain technology and proposes a blockchain consensus scheme based on the proof of distributed deep learning work (BCDDL) to reduce the waste of computing resources in blockchain. BCDDL treats DDL training as a mining task and allocates different training data to different nodes based on their computing power to improve the utilization rate of computing resources. In order to balance the demand and supply of computing resources and incentivize nodes to participate in training tasks and consensus, a dynamic incentive mechanism based on task size and computing resources (DIM-TSCR) is proposed. In addition, in order to reduce the impact of malicious nodes on the accuracy of the global model, a model aggregation algorithm based on training data size and model accuracy (MAA-TM) is designed. Experiments demonstrate that BCDDL can significantly increase the utilization rate of computing resources and diminish the impact of malicious nodes on the accuracy of the global model.

{"title":"Blockchain Consensus Scheme Based on the Proof of Distributed Deep Learning Work","authors":"Hui Zhi, HongCheng Wu, Yu Huang, ChangLin Tian, SuZhen Wang","doi":"10.1049/sfw2/3378383","DOIUrl":"https://doi.org/10.1049/sfw2/3378383","url":null,"abstract":"<div>\u0000 <p>With the development of artificial intelligence and blockchain technology, the training of deep learning models needs large computing resources. Meanwhile, the Proof of Work (PoW) consensus mechanism in blockchain systems often leads to the wastage of computing resources. This article combines distributed deep learning (DDL) with blockchain technology and proposes a blockchain consensus scheme based on the proof of distributed deep learning work (BCDDL) to reduce the waste of computing resources in blockchain. BCDDL treats DDL training as a mining task and allocates different training data to different nodes based on their computing power to improve the utilization rate of computing resources. In order to balance the demand and supply of computing resources and incentivize nodes to participate in training tasks and consensus, a dynamic incentive mechanism based on task size and computing resources (DIM-TSCR) is proposed. In addition, in order to reduce the impact of malicious nodes on the accuracy of the global model, a model aggregation algorithm based on training data size and model accuracy (MAA-TM) is designed. Experiments demonstrate that BCDDL can significantly increase the utilization rate of computing resources and diminish the impact of malicious nodes on the accuracy of the global model.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2025 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2/3378383","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143117532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Code Parameter Summarization Based on Transformer and Fusion Strategy

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IET Software

Pub Date : 2024-12-31 DOI: 10.1049/sfw2/3706673

Fanlong Zhang, Jiancheng Fan, Weiqi Li, Siau-cheng Khoo

Context: As more time has been spent on code comprehension activities during software development, automatic code summarization has received much attention in software engineering research, with the goal of enhancing software comprehensibility. In the meantime, it is prevalently known that a good knowledge about the declaration and the use of method parameters can effectively enhance the understanding of the associated methods. A traditional approach used in software development is to declare the types of method parameters.

Objective: In this work, we advocate parameter-level code summarization and propose a novel approach to automatically generate parameter summaries of a given method. Parameter summarization is considerably challenging, as neither do we know the kind of information of the parameters that can be employed for summarization nor do we know the methods for retrieving such information.

Method: We present paramTrans, which is a novel approach for parameter summarization. paramTrans characterizes the semantic features from parameter-related information based on transformer; it also explores three fusion strategies for absorbing the method-level information to enhance the performance. Moreover, to retrieve parameter-related information, a parameter slicing algorithm (named paramSlice) is proposed, which slices the parameter-related node from the abstract syntax tree (AST) at the statement level.

Results: We conducted experiments to verify the effectiveness of our approach. Experimental results show that our approach possesses an effective ability in summarizing parameters; such ability can be further enhanced by understanding the available summaries about individual methods, through the introduction of three fusion strategies.

Conclusion: We recommend developers employ our approach as well as the fusion strategies to produce parameter summaries to enhance the comprehensibility of code.

{"title":"Code Parameter Summarization Based on Transformer and Fusion Strategy","authors":"Fanlong Zhang, Jiancheng Fan, Weiqi Li, Siau-cheng Khoo","doi":"10.1049/sfw2/3706673","DOIUrl":"https://doi.org/10.1049/sfw2/3706673","url":null,"abstract":"<div>\u0000 <p><b>Context:</b> As more time has been spent on code comprehension activities during software development, automatic code summarization has received much attention in software engineering research, with the goal of enhancing software comprehensibility. In the meantime, it is prevalently known that a good knowledge about the declaration and the use of method parameters can effectively enhance the understanding of the associated methods. A traditional approach used in software development is to declare the types of method parameters.</p>\u0000 <p><b>Objective:</b> In this work, we advocate parameter-level code summarization and propose a novel approach to automatically generate parameter summaries of a given method. Parameter summarization is considerably challenging, as neither do we know the kind of information of the parameters that can be employed for summarization nor do we know the methods for retrieving such information.</p>\u0000 <p><b>Method:</b> We present paramTrans, which is a novel approach for parameter summarization. paramTrans characterizes the semantic features from parameter-related information based on transformer; it also explores three fusion strategies for absorbing the method-level information to enhance the performance. Moreover, to retrieve parameter-related information, a parameter slicing algorithm (named paramSlice) is proposed, which slices the parameter-related node from the abstract syntax tree (AST) at the statement level.</p>\u0000 <p><b>Results:</b> We conducted experiments to verify the effectiveness of our approach. Experimental results show that our approach possesses an effective ability in summarizing parameters; such ability can be further enhanced by understanding the available summaries about individual methods, through the introduction of three fusion strategies.</p>\u0000 <p><b>Conclusion:</b> We recommend developers employ our approach as well as the fusion strategies to produce parameter summaries to enhance the comprehensibility of code.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2/3706673","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143121177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Observational Study on Flask Web Framework Questions on Stack Overflow (SO) 关于Flask Web框架Stack Overflow （SO）问题的观察研究

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IET Software

Pub Date : 2024-12-19 DOI: 10.1049/sfw2/1905538

Luluh Albesher, Reem Alfayez

Web-based applications are popular in demand and usage. To facilitate the development of web-based applications, the software engineering community developed multiple web application frameworks, one of which is Flask. Flask is a popular web framework that allows developers to speed up and scale the development of web applications. A review of the software engineering literature revealed that the Stack Overflow (SO) website has proven its effectiveness in providing a better understanding of multiple subjects within the software engineering field. This study aims to analyze SO Flask-related questions to gain a better understanding of the stance of Flask on the website. We identified a set of 70,230 Flask-related questions that we further analyzed to estimate how the interest towards the framework evolved over time on the website. Afterward, we utilized the Latent Dirichlet Allocation (LDA) algorithm to identify Flask-related topics that are discussed within the set of the identified questions. Moreover, we leveraged a number of proxy measures to examine the difficulty and popularity of the identified topics. The study found that the interest towards Flask has been generally increasing on the website, with a peak in 2020 and drops in the following years. Moreover, Flask-related questions on SO revolve around 12 topics, where Application Programming Interface (API) can be considered the most popular topic and background tasks can be considered the most difficult one. Software engineering researchers, practitioners, educators, and Flask contributors may find this study useful in guiding their future Flask-related endeavors.

基于web的应用程序在需求和使用方面都很流行。为了促进基于web的应用程序的开发，软件工程社区开发了多个web应用程序框架，其中之一就是Flask。Flask是一个流行的web框架，它允许开发人员加速和扩展web应用程序的开发。对软件工程文献的回顾表明，Stack Overflow （SO）网站已经证明了它在提供对软件工程领域内多个主题的更好理解方面的有效性。本研究旨在分析与SO Flask相关的问题，以更好地了解Flask在网站上的立场。我们确定了70,230个与flask相关的问题，我们进一步分析了这些问题，以估计网站上对该框架的兴趣是如何随着时间的推移而演变的。之后，我们利用潜在狄利克雷分配（LDA）算法来识别在识别问题集中讨论的flask相关主题。此外，我们利用一些代理措施来检查确定主题的难度和受欢迎程度。研究发现，网站上对Flask的兴趣总体上在增加，在2020年达到顶峰，随后几年下降。此外，与flask相关的SO问题围绕着12个主题，其中应用程序编程接口（API）可以被认为是最受欢迎的主题，后台任务可以被认为是最难的主题。软件工程研究人员、实践者、教育者和Flask贡献者可能会发现这项研究对指导他们未来与Flask相关的工作很有用。

{"title":"An Observational Study on Flask Web Framework Questions on Stack Overflow (SO)","authors":"Luluh Albesher, Reem Alfayez","doi":"10.1049/sfw2/1905538","DOIUrl":"https://doi.org/10.1049/sfw2/1905538","url":null,"abstract":"<div>\u0000 <p>Web-based applications are popular in demand and usage. To facilitate the development of web-based applications, the software engineering community developed multiple web application frameworks, one of which is Flask. Flask is a popular web framework that allows developers to speed up and scale the development of web applications. A review of the software engineering literature revealed that the Stack Overflow (SO) website has proven its effectiveness in providing a better understanding of multiple subjects within the software engineering field. This study aims to analyze SO Flask-related questions to gain a better understanding of the stance of Flask on the website. We identified a set of 70,230 Flask-related questions that we further analyzed to estimate how the interest towards the framework evolved over time on the website. Afterward, we utilized the Latent Dirichlet Allocation (LDA) algorithm to identify Flask-related topics that are discussed within the set of the identified questions. Moreover, we leveraged a number of proxy measures to examine the difficulty and popularity of the identified topics. The study found that the interest towards Flask has been generally increasing on the website, with a peak in 2020 and drops in the following years. Moreover, Flask-related questions on SO revolve around 12 topics, where Application Programming Interface (API) can be considered the most popular topic and background tasks can be considered the most difficult one. Software engineering researchers, practitioners, educators, and Flask contributors may find this study useful in guiding their future Flask-related endeavors.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2/1905538","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142851455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Software Defect Prediction Method Based on Clustering Ensemble Learning 基于聚类集合学习的软件缺陷预测方法

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IET Software

Pub Date : 2024-11-19 DOI: 10.1049/2024/6294422

Hongwei Tao, Qiaoling Cao, Haoran Chen, Yanting Li, Xiaoxu Niu, Tao Wang, Zhenhao Geng, Songtao Shang

The technique of software defect prediction aims to assess and predict potential defects in software projects and has made significant progress in recent years within software development. In previous studies, this technique largely relied on supervised learning methods, requiring a substantial amount of labeled historical defect data to train the models. However, obtaining these labeled data often demands significant time and resources. In contrast, software defect prediction based on unsupervised learning does not depend on known labeled data, eliminating the need for large-scale data labeling, thereby saving considerable time and resources while providing a more flexible solution for ensuring software quality. This paper conducts software defect prediction using unsupervised learning methods on data from 16 projects across two public datasets (PROMISE and NASA). During the feature selection step, a chi-squared sparse feature selection method is proposed. This feature selection strategy combines chi-squared tests with sparse principal component analysis (SPCA). Specifically, the chi-squared test is first used to filter out the most statistically significant features, and then the SPCA is applied to reduce the dimensionality of these significant features. In the clustering step, the dot product matrix and Pearson correlation coefficient (PCC) matrix are used to construct weighted adjacency matrices, and a clustering overlap method is proposed. This method integrates spectral clustering, Newman clustering, fluid clustering, and Clauset–Newman–Moore (CNM) clustering through ensemble learning. Experimental results indicate that, in the absence of labeled data, using the chi-squared sparse method for feature selection demonstrates superior performance, and the proposed clustering overlap method outperforms or is comparable to the effectiveness of the four baseline clustering methods.

软件缺陷预测技术旨在评估和预测软件项目中的潜在缺陷，近年来在软件开发领域取得了重大进展。在以往的研究中，该技术主要依赖于监督学习方法，需要大量标注的历史缺陷数据来训练模型。然而，获取这些标注数据往往需要大量的时间和资源。相比之下，基于无监督学习的软件缺陷预测不依赖于已知的标记数据，无需进行大规模的数据标记，从而节省了大量的时间和资源，同时为确保软件质量提供了更灵活的解决方案。本文使用无监督学习方法对两个公共数据集（PROMISE 和 NASA）中 16 个项目的数据进行了软件缺陷预测。在特征选择步骤中，提出了一种奇平方稀疏特征选择方法。这种特征选择策略结合了卡方检验和稀疏主成分分析（SPCA）。具体来说，首先使用卡方检验筛选出统计意义最显著的特征，然后应用 SPCA 降低这些显著特征的维度。在聚类步骤中，利用点积矩阵和皮尔逊相关系数（PCC）矩阵构建加权邻接矩阵，并提出一种聚类重叠方法。该方法通过集合学习将光谱聚类、纽曼聚类、流体聚类和克劳塞特-纽曼-摩尔（CNM）聚类整合在一起。实验结果表明，在没有标注数据的情况下，使用秩方稀疏法进行特征选择表现出更优越的性能，而所提出的聚类重叠方法则优于或相当于四种基线聚类方法的效果。

{"title":"Software Defect Prediction Method Based on Clustering Ensemble Learning","authors":"Hongwei Tao, Qiaoling Cao, Haoran Chen, Yanting Li, Xiaoxu Niu, Tao Wang, Zhenhao Geng, Songtao Shang","doi":"10.1049/2024/6294422","DOIUrl":"https://doi.org/10.1049/2024/6294422","url":null,"abstract":"<div>\u0000 <p>The technique of software defect prediction aims to assess and predict potential defects in software projects and has made significant progress in recent years within software development. In previous studies, this technique largely relied on supervised learning methods, requiring a substantial amount of labeled historical defect data to train the models. However, obtaining these labeled data often demands significant time and resources. In contrast, software defect prediction based on unsupervised learning does not depend on known labeled data, eliminating the need for large-scale data labeling, thereby saving considerable time and resources while providing a more flexible solution for ensuring software quality. This paper conducts software defect prediction using unsupervised learning methods on data from 16 projects across two public datasets (PROMISE and NASA). During the feature selection step, a chi-squared sparse feature selection method is proposed. This feature selection strategy combines chi-squared tests with sparse principal component analysis (SPCA). Specifically, the chi-squared test is first used to filter out the most statistically significant features, and then the SPCA is applied to reduce the dimensionality of these significant features. In the clustering step, the dot product matrix and Pearson correlation coefficient (PCC) matrix are used to construct weighted adjacency matrices, and a clustering overlap method is proposed. This method integrates spectral clustering, Newman clustering, fluid clustering, and Clauset–Newman–Moore (CNM) clustering through ensemble learning. Experimental results indicate that, in the absence of labeled data, using the chi-squared sparse method for feature selection demonstrates superior performance, and the proposed clustering overlap method outperforms or is comparable to the effectiveness of the four baseline clustering methods.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/6294422","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142674173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ConCPDP: A Cross-Project Defect Prediction Method Integrating Contrastive Pretraining and Category Boundary Adjustment ConCPDP：整合对比预训练和类别边界调整的跨项目缺陷预测方法

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IET Software

Pub Date : 2024-11-13 DOI: 10.1049/2024/5102699

Hengjie Song, Yufei Pan, Feng Guo, Xue Zhang, Le Ma, Siyu Jiang

Software defect prediction (SDP) is a crucial phase preceding the launch of software products. Cross-project defect prediction (CPDP) is introduced for the anticipation of defects in novel projects lacking defect labels. CPDP can use defect information of mature projects to speed up defect prediction for new projects. So that developers can quickly get the defect information of the new project, so that they can test the software project pertinently. At present, the predominant approaches in CPDP rely on deep learning, and the performance of the ultimate model is notably affected by the quality of the training dataset. However, the dataset of CPDP not only has few samples but also has almost no label information in new projects, which makes the general deep-learning-based CPDP model not ideal. In addition, most of the current CPDP models do not fully consider the enrichment of classification boundary samples after cross-domain, leading to suboptimal predictive capabilities of the model. To overcome these obstacles, we present contrastive learning pretraining for CPDP (ConCPDP), a CPDP method integrating contrastive pretraining and category boundary adjustment. We first perform data augmentation on the source and target domain code files and then extract the enhanced data as an abstract syntax tree (AST). The AST is then transformed into an integer sequence using specific mapping rules, serving as input for the subsequent neural network. A neural network based on bidirectional long short-term memory (Bi-LSTM) will receive an integer sequence and output a feature vector. Then, the feature vectors are input into the contrastive module to optimise the feature extraction network. The pretrained feature extractor can be fine-tuned by the maximum mean discrepancy (MMD) between the feature distribution of the source domain and the target domain and the binary classification loss on the source domain. This paper conducts a large number of experiments on the PROMISE dataset, which is commonly used for CPDP, to validate ConCPDP’s efficacy, achieving superior results in terms of F₁ measure, area under curve (AUC), and Matthew’s correlation coefficient (MCC).

软件缺陷预测（SDP）是软件产品发布前的一个关键阶段。跨项目缺陷预测（CPDP）是为预测缺乏缺陷标签的新项目中的缺陷而引入的。CPDP 可以利用成熟项目的缺陷信息来加快新项目的缺陷预测。这样，开发人员就能快速获得新项目的缺陷信息，从而有针对性地测试软件项目。目前，CPDP 的主要方法依赖于深度学习，而最终模型的性能明显受到训练数据集质量的影响。然而，CPDP 的数据集不仅样本少，而且新项目几乎没有标签信息，这使得基于深度学习的一般 CPDP 模型并不理想。此外，目前的 CPDP 模型大多没有充分考虑跨域后分类边界样本的丰富性，导致模型的预测能力不理想。为了克服这些障碍，我们提出了 CPDP 的对比学习预训练（ConCPDP），这是一种整合了对比预训练和类别边界调整的 CPDP 方法。我们首先对源代码文件和目标领域代码文件进行数据增强，然后将增强后的数据提取为抽象语法树（AST）。然后使用特定的映射规则将 AST 转换为整数序列，作为后续神经网络的输入。基于双向长短期记忆（Bi-LSTM）的神经网络将接收整数序列并输出特征向量。然后，将特征向量输入对比模块，以优化特征提取网络。预训练的特征提取器可根据源域和目标域特征分布之间的最大平均差异（MMD）以及源域的二元分类损失进行微调。本文在 CPDP 常用的 PROMISE 数据集上进行了大量实验，验证了 ConCPDP 的有效性，在 F1 指标、曲线下面积（AUC）和马太相关系数（MCC）方面取得了优异的结果。

{"title":"ConCPDP: A Cross-Project Defect Prediction Method Integrating Contrastive Pretraining and Category Boundary Adjustment","authors":"Hengjie Song, Yufei Pan, Feng Guo, Xue Zhang, Le Ma, Siyu Jiang","doi":"10.1049/2024/5102699","DOIUrl":"https://doi.org/10.1049/2024/5102699","url":null,"abstract":"<div>\u0000 <p>Software defect prediction (SDP) is a crucial phase preceding the launch of software products. Cross-project defect prediction (CPDP) is introduced for the anticipation of defects in novel projects lacking defect labels. CPDP can use defect information of mature projects to speed up defect prediction for new projects. So that developers can quickly get the defect information of the new project, so that they can test the software project pertinently. At present, the predominant approaches in CPDP rely on deep learning, and the performance of the ultimate model is notably affected by the quality of the training dataset. However, the dataset of CPDP not only has few samples but also has almost no label information in new projects, which makes the general deep-learning-based CPDP model not ideal. In addition, most of the current CPDP models do not fully consider the enrichment of classification boundary samples after cross-domain, leading to suboptimal predictive capabilities of the model. To overcome these obstacles, we present contrastive learning pretraining for CPDP (ConCPDP), a CPDP method integrating contrastive pretraining and category boundary adjustment. We first perform data augmentation on the source and target domain code files and then extract the enhanced data as an abstract syntax tree (AST). The AST is then transformed into an integer sequence using specific mapping rules, serving as input for the subsequent neural network. A neural network based on bidirectional long short-term memory (Bi-LSTM) will receive an integer sequence and output a feature vector. Then, the feature vectors are input into the contrastive module to optimise the feature extraction network. The pretrained feature extractor can be fine-tuned by the maximum mean discrepancy (MMD) between the feature distribution of the source domain and the target domain and the binary classification loss on the source domain. This paper conducts a large number of experiments on the PROMISE dataset, which is commonly used for CPDP, to validate ConCPDP’s efficacy, achieving superior results in terms of <i>F</i><sub>1</sub> measure, area under curve (AUC), and Matthew’s correlation coefficient (MCC).</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/5102699","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142641693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Breaking the Blockchain Trilemma: A Comprehensive Consensus Mechanism for Ensuring Security, Scalability, and Decentralization 打破区块链三难困境：确保安全性、可扩展性和去中心化的全面共识机制

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IET Software

Pub Date : 2024-10-10 DOI: 10.1049/2024/6874055

Khandakar Md Shafin, Saha Reno

The ongoing challenge in the world of blockchain technology is finding a solution to the trilemma that involves balancing decentralization, security, and scalability. This paper introduces a pioneering blockchain architecture designed to transcend this trilemma, uniting advanced cryptographic methods, inventive security protocols, and dynamic decentralization mechanisms. Employing established techniques such as elliptic curve cryptography, Schnorr verifiable random function, and zero-knowledge proof (zk-SNARK), alongside groundbreaking methodologies for stake distribution, anomaly detection, and incentive alignment, our framework sets a new benchmark for secure, scalable, and decentralized blockchain ecosystems. The proposed system surpasses top-tier consensuses by attaining a throughput of 1700+ transactions per second, ensuring robust security against all well-known blockchain attacks without compromising scalability and demonstrating solid decentralization in benchmark analysis alongside 25 other blockchain systems, all achieved with an affordable hardware cost for validators and an average CPU usage of only 16.1%.

区块链技术领域一直面临的挑战是如何解决去中心化、安全性和可扩展性之间的三难问题。本文介绍了一种开创性的区块链架构，旨在将先进的加密方法、创新的安全协议和动态去中心化机制结合在一起，从而超越这一三难问题。我们的框架采用了椭圆曲线密码学、施诺尔可验证随机函数和零知识证明（zk-SNARK）等成熟技术，以及股权分配、异常检测和激励调整等开创性方法，为安全、可扩展和去中心化的区块链生态系统树立了新标杆。拟议的系统超越了顶级共识，每秒吞吐量达到 1700 多笔交易，在不影响可扩展性的情况下确保了抵御所有知名区块链攻击的强大安全性，并在基准分析中与其他 25 个区块链系统一起展示了稳固的去中心化，所有这一切都以验证者可承受的硬件成本和平均仅 16.1% 的 CPU 使用率实现的。

{"title":"Breaking the Blockchain Trilemma: A Comprehensive Consensus Mechanism for Ensuring Security, Scalability, and Decentralization","authors":"Khandakar Md Shafin, Saha Reno","doi":"10.1049/2024/6874055","DOIUrl":"https://doi.org/10.1049/2024/6874055","url":null,"abstract":"<div>\u0000 <p>The ongoing challenge in the world of blockchain technology is finding a solution to the trilemma that involves balancing decentralization, security, and scalability. This paper introduces a pioneering blockchain architecture designed to transcend this trilemma, uniting advanced cryptographic methods, inventive security protocols, and dynamic decentralization mechanisms. Employing established techniques such as elliptic curve cryptography, Schnorr verifiable random function, and zero-knowledge proof (zk-SNARK), alongside groundbreaking methodologies for stake distribution, anomaly detection, and incentive alignment, our framework sets a new benchmark for secure, scalable, and decentralized blockchain ecosystems. The proposed system surpasses top-tier consensuses by attaining a throughput of 1700+ transactions per second, ensuring robust security against all well-known blockchain attacks without compromising scalability and demonstrating solid decentralization in benchmark analysis alongside 25 other blockchain systems, all achieved with an affordable hardware cost for validators and an average CPU usage of only 16.1%.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/6874055","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142404701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IC-GraF: An Improved Clustering with Graph-Embedding-Based Features for Software Defect Prediction IC-GraF：基于图形嵌入特征的改进聚类，用于软件缺陷预测

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IET Software

Pub Date : 2024-09-16 DOI: 10.1049/2024/8027037

Xuanye Wang, Lu Lu, Qingyan Tian, Haishan Lin

Software defect prediction (SDP) has been a prominent area of research in software engineering. Previous SDP methods often struggled in industrial applications, primarily due to the need for sufficient historical data. Thus, clustering-based unsupervised defect prediction (CUDP) and cross-project defect prediction (CPDP) emerged to address this challenge. However, the former exhibited limitations in capturing semantic and structural features, while the latter encountered constraints due to differences in data distribution across projects. Therefore, we introduce a novel framework called improved clustering with graph-embedding-based features (IC-GraF) for SDP without the reliance on historical data. First, a preprocessing operation is performed to extract program dependence graphs (PDGs) and mark distinct dependency relationships within them. Second, the improved deep graph infomax (IDGI) model, an extension of the DGI model specifically for SDP, is designed to generate graph-level representations of PDGs. Finally, a heuristic-based k-means clustering algorithm is employed to classify the features generated by IDGI. To validate the efficacy of IC-GraF, we conduct experiments based on 24 releases of the PROMISE dataset, using F-measure and G-measure as evaluation criteria. The findings indicate that IC-GraF achieves 5.0%−42.7% higher F-measure, 5%−39.4% higher G-measure, and 2.5%−11.4% higher AUC over existing CUDP methods. Even when compared with eight supervised learning-based SDP methods, IC-GraF maintains a superior competitive edge.

软件缺陷预测（SDP）一直是软件工程的一个重要研究领域。以前的 SDP 方法在工业应用中往往举步维艰，主要原因是需要足够的历史数据。因此，基于聚类的无监督缺陷预测（CUDP）和跨项目缺陷预测（CPDP）应运而生，以应对这一挑战。然而，前者在捕捉语义和结构特征方面表现出局限性，而后者则因跨项目数据分布的差异而遇到限制。因此，我们为 SDP 引入了一种新的框架，即基于图嵌入特征的改进聚类（IC-GraF），而无需依赖历史数据。首先，进行预处理操作以提取程序依赖图（PDGs），并标记其中不同的依赖关系。其次，设计了改进的深度图 infomax（IDGI）模型，该模型是专门针对 SDP 的 DGI 模型的扩展，用于生成 PDGs 的图级表示。最后，采用基于启发式的 k-means 聚类算法对 IDGI 生成的特征进行分类。为了验证 IC-GraF 的功效，我们使用 F-measure 和 G-measure 作为评估标准，基于 24 个发布的 PROMISE 数据集进行了实验。结果表明，与现有的 CUDP 方法相比，IC-GraF 的 F-measure 高出 5.0%-42.7%，G-measure 高出 5%-39.4%，AUC 高出 2.5%-11.4%。即使与八种基于监督学习的 SDP 方法相比，IC-GraF 也保持了卓越的竞争优势。

{"title":"IC-GraF: An Improved Clustering with Graph-Embedding-Based Features for Software Defect Prediction","authors":"Xuanye Wang, Lu Lu, Qingyan Tian, Haishan Lin","doi":"10.1049/2024/8027037","DOIUrl":"https://doi.org/10.1049/2024/8027037","url":null,"abstract":"<div>\u0000 <p>Software defect prediction (SDP) has been a prominent area of research in software engineering. Previous SDP methods often struggled in industrial applications, primarily due to the need for sufficient historical data. Thus, clustering-based unsupervised defect prediction (CUDP) and cross-project defect prediction (CPDP) emerged to address this challenge. However, the former exhibited limitations in capturing semantic and structural features, while the latter encountered constraints due to differences in data distribution across projects. Therefore, we introduce a novel framework called improved clustering with graph-embedding-based features (IC-GraF) for SDP without the reliance on historical data. First, a preprocessing operation is performed to extract program dependence graphs (PDGs) and mark distinct dependency relationships within them. Second, the improved deep graph infomax (IDGI) model, an extension of the DGI model specifically for SDP, is designed to generate graph-level representations of PDGs. Finally, a heuristic-based k-means clustering algorithm is employed to classify the features generated by IDGI. To validate the efficacy of IC-GraF, we conduct experiments based on 24 releases of the PROMISE dataset, using F-measure and G-measure as evaluation criteria. The findings indicate that IC-GraF achieves 5.0%−42.7% higher F-measure, 5%−39.4% higher G-measure, and 2.5%−11.4% higher AUC over existing CUDP methods. Even when compared with eight supervised learning-based SDP methods, IC-GraF maintains a superior competitive edge.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/8027037","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142244994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IAPCP: An Effective Cross-Project Defect Prediction Model via Intra-Domain Alignment and Programming-Based Distribution Adaptation IAPCP：通过域内对齐和基于编程的分布适应，建立有效的跨项目缺陷预测模型

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IET Software

Pub Date : 2024-09-03 DOI: 10.1049/2024/5358773

Nana Zhang, Kun Zhu, Dandan Zhu

Cross-project defect prediction (CPDP) aims to identify defect-prone software instances in one project (target) using historical data collected from other software projects (source), which can help maintainers allocate limited testing resources reasonably. Unfortunately, the feature distribution discrepancy between the source and target projects makes it challenging to transfer the matching feature representation and severely hinders CPDP performance. Besides, existing CPDP models require an intensively expensive and time-consuming process to tune a lot of parameters. To address the above limitations, we propose an effective CPDP model named IAPCP based on distribution adaptation in this study, which consists of two stages: correlation alignment and intra-domain programming. Correlation alignment first calculates the covariance matrices of the source and target projects and then erases some features of the source project (i.e., whitening operation) and employs the features of the target project (i.e., target covariance) to fill the source project, thereby well aligning the source and target feature distributions and reducing the distribution discrepancy across projects. Intra-domain programming can directly learn a nonparametric linear transfer defect predictor with strong discriminative capacity by solving a probabilistic annotation matrix (PAM) based on the adjusted features of the source project. The model does not require model selection and parameter tuning. Extensive experiments on a total of 82 cross-project pairs from 16 software projects demonstrate that IAPCP can achieve competitive CPDP effectiveness and efficiency compared with multiple state-of-the-art baseline models.

跨项目缺陷预测（CPDP）旨在利用从其他软件项目（源项目）收集到的历史数据，识别一个项目（目标项目）中容易出现缺陷的软件实例，从而帮助维护人员合理分配有限的测试资源。遗憾的是，源项目和目标项目之间的特征分布差异使得转移匹配的特征表示具有挑战性，严重影响了 CPDP 的性能。此外，现有的 CPDP 模型需要耗费大量时间和金钱来调整大量参数。针对上述局限性，我们在本研究中提出了一种基于分布适应的有效 CPDP 模型 IAPCP，该模型由两个阶段组成：相关性对齐和域内编程。相关对齐首先计算源项目和目标项目的协方差矩阵，然后擦除源项目的部分特征（即白化操作），并利用目标项目的特征（即目标协方差）来填充源项目，从而很好地对齐源项目和目标项目的特征分布，减少项目间的分布差异。域内编程可以根据调整后的源项目特征，通过求解概率注释矩阵（PAM），直接学习具有较强判别能力的非参数线性转移缺陷预测器。该模型无需进行模型选择和参数调整。对来自 16 个软件项目的 82 个跨项目对进行的广泛实验表明，与多个最先进的基线模型相比，IAPCP 可以实现具有竞争力的 CPDP 效果和效率。

{"title":"IAPCP: An Effective Cross-Project Defect Prediction Model via Intra-Domain Alignment and Programming-Based Distribution Adaptation","authors":"Nana Zhang, Kun Zhu, Dandan Zhu","doi":"10.1049/2024/5358773","DOIUrl":"https://doi.org/10.1049/2024/5358773","url":null,"abstract":"<div>\u0000 <p>Cross-project defect prediction (CPDP) aims to identify defect-prone software instances in one project (target) using historical data collected from other software projects (source), which can help maintainers allocate limited testing resources reasonably. Unfortunately, the feature distribution discrepancy between the source and target projects makes it challenging to transfer the matching feature representation and severely hinders CPDP performance. Besides, existing CPDP models require an intensively expensive and time-consuming process to tune a lot of parameters. To address the above limitations, we propose an effective CPDP model named IAPCP based on distribution adaptation in this study, which consists of two stages: correlation alignment and intra-domain programming. Correlation alignment first calculates the covariance matrices of the source and target projects and then erases some features of the source project (i.e., whitening operation) and employs the features of the target project (i.e., target covariance) to fill the source project, thereby well aligning the source and target feature distributions and reducing the distribution discrepancy across projects. Intra-domain programming can directly learn a nonparametric linear transfer defect predictor with strong discriminative capacity by solving a probabilistic annotation matrix (PAM) based on the adjusted features of the source project. The model does not require model selection and parameter tuning. Extensive experiments on a total of 82 cross-project pairs from 16 software projects demonstrate that IAPCP can achieve competitive CPDP effectiveness and efficiency compared with multiple state-of-the-art baseline models.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/5358773","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Understanding Work Rhythms in Software Development and Their Effects on Technical Performance 了解软件开发中的工作节奏及其对技术绩效的影响

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IET Software

Pub Date : 2024-08-30 DOI: 10.1049/2024/8846233

Jiayun Zhang, Qingyuan Gong, Yang Chen, Yu Xiao, Xin Wang, Aaron Yi Ding

The temporal patterns of code submissions, denoted as work rhythms, provide valuable insight into the work habits and productivity in software development. In this paper, we investigate the work rhythms in software development and their effects on technical performance by analyzing the profiles of developers and projects from 110 international organizations and their commit activities on GitHub. Using clustering, we identify four work rhythms among individual developers and three work rhythms among software projects. Strong correlations are found between work rhythms and work regions, seniority, and collaboration roles. We then define practical measures for technical performance and examine the effects of different work rhythms on them. Our findings suggest that moderate overtime is related to good technical performance, whereas fixed office hours are associated with receiving less attention. Furthermore, we survey 92 developers to understand their experience with working overtime and the reasons behind it. The survey reveals that developers often work longer than required. A positive attitude towards extended working hours is associated with situations that require addressing unexpected issues or when clear incentives are provided. In addition to the insights from our quantitative and qualitative studies, this work sheds light on tangible measures for both software companies and individual developers to improve the recruitment process, project planning, and productivity assessment.

代码提交的时间模式被称为工作节奏，它为了解软件开发中的工作习惯和生产率提供了宝贵的视角。在本文中，我们通过分析 110 个国际组织的开发人员和项目的概况及其在 GitHub 上的提交活动，研究了软件开发中的工作节奏及其对技术性能的影响。通过聚类，我们在单个开发人员中发现了四种工作节奏，在软件项目中发现了三种工作节奏。我们发现工作节奏与工作地区、资历和协作角色之间存在很强的相关性。然后，我们定义了技术绩效的实用衡量标准，并研究了不同工作节奏对它们的影响。我们的研究结果表明，适度加班与良好的技术绩效有关，而固定的办公时间则与受到的关注较少有关。此外，我们还对 92 名开发人员进行了调查，以了解他们的加班经历及其背后的原因。调查显示，开发人员的工作时间往往超过要求。对于延长工作时间的积极态度与需要解决突发问题或提供明确奖励的情况有关。除了从定量和定性研究中获得的启示外，这项工作还为软件公司和开发人员个人提供了切实可行的措施，以改进招聘流程、项目规划和生产力评估。

{"title":"Understanding Work Rhythms in Software Development and Their Effects on Technical Performance","authors":"Jiayun Zhang, Qingyuan Gong, Yang Chen, Yu Xiao, Xin Wang, Aaron Yi Ding","doi":"10.1049/2024/8846233","DOIUrl":"https://doi.org/10.1049/2024/8846233","url":null,"abstract":"<div>\u0000 <p>The temporal patterns of code submissions, denoted as work rhythms, provide valuable insight into the work habits and productivity in software development. In this paper, we investigate the work rhythms in software development and their effects on technical performance by analyzing the profiles of developers and projects from 110 international organizations and their commit activities on GitHub. Using clustering, we identify four work rhythms among individual developers and three work rhythms among software projects. Strong correlations are found between work rhythms and work regions, seniority, and collaboration roles. We then define practical measures for technical performance and examine the effects of different work rhythms on them. Our findings suggest that moderate overtime is related to good technical performance, whereas fixed office hours are associated with receiving less attention. Furthermore, we survey 92 developers to understand their experience with working overtime and the reasons behind it. The survey reveals that developers often work longer than required. A positive attitude towards extended working hours is associated with situations that require addressing unexpected issues or when clear incentives are provided. In addition to the insights from our quantitative and qualitative studies, this work sheds light on tangible measures for both software companies and individual developers to improve the recruitment process, project planning, and productivity assessment.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/8846233","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142100088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Research and Application of Firewall Log and Intrusion Detection Log Data Visualization System 防火墙日志和入侵检测日志数据可视化系统的研究与应用

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IET Software

Pub Date : 2024-08-13 DOI: 10.1049/2024/7060298

Ma Mingze

This paper tackles current challenges in network security analysis by proposing an innovative information gain-based feature selection algorithm and leveraging visualization techniques to develop a network security log data visualization system. The system’s key functions include raw data collection for firewall logs and intrusion detection logs, data preprocessing, database management, data manipulation, data logic processing, and data visualization. Through statistical analysis of log data and the construction of visualization models, the system presents analysis results in diverse graphical formats while offering interactive capabilities. Seamlessly integrating data generation, processing, analysis, and display processes, the system demonstrates high accuracy, precision, recall, F1 score, and real-time performance metrics, reaching 98.3%, 92.1%, 97.5%, 98.1%, and 91.2%, respectively, in experimental evaluations. The proposed method significantly enhances real-time prediction capabilities of network security status and monitoring efficiency of network devices, providing a robust security assurance tool.

本文针对当前网络安全分析面临的挑战，提出了一种创新的基于信息增益的特征选择算法，并利用可视化技术开发了一个网络安全日志数据可视化系统。该系统的主要功能包括防火墙日志和入侵检测日志的原始数据采集、数据预处理、数据库管理、数据操作、数据逻辑处理和数据可视化。通过对日志数据进行统计分析和构建可视化模型，该系统能以多种图形格式显示分析结果，同时提供交互功能。该系统将数据生成、处理、分析和显示过程无缝集成，在实验评估中分别达到了 98.3%、92.1%、97.5%、98.1% 和 91.2% 的高准确度、高精确度、高召回率、F1 分数和实时性能指标。所提出的方法大大提高了网络安全状态的实时预测能力和网络设备的监控效率，提供了一个强大的安全保障工具。

{"title":"Research and Application of Firewall Log and Intrusion Detection Log Data Visualization System","authors":"Ma Mingze","doi":"10.1049/2024/7060298","DOIUrl":"https://doi.org/10.1049/2024/7060298","url":null,"abstract":"<div>\u0000 <p>This paper tackles current challenges in network security analysis by proposing an innovative information gain-based feature selection algorithm and leveraging visualization techniques to develop a network security log data visualization system. The system’s key functions include raw data collection for firewall logs and intrusion detection logs, data preprocessing, database management, data manipulation, data logic processing, and data visualization. Through statistical analysis of log data and the construction of visualization models, the system presents analysis results in diverse graphical formats while offering interactive capabilities. Seamlessly integrating data generation, processing, analysis, and display processes, the system demonstrates high accuracy, precision, recall, F1 score, and real-time performance metrics, reaching 98.3%, 92.1%, 97.5%, 98.1%, and 91.2%, respectively, in experimental evaluations. The proposed method significantly enhances real-time prediction capabilities of network security status and monitoring efficiency of network devices, providing a robust security assurance tool.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/7060298","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141973646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0