The ongoing challenge in the world of blockchain technology is finding a solution to the trilemma that involves balancing decentralization, security, and scalability. This paper introduces a pioneering blockchain architecture designed to transcend this trilemma, uniting advanced cryptographic methods, inventive security protocols, and dynamic decentralization mechanisms. Employing established techniques such as elliptic curve cryptography, Schnorr verifiable random function, and zero-knowledge proof (zk-SNARK), alongside groundbreaking methodologies for stake distribution, anomaly detection, and incentive alignment, our framework sets a new benchmark for secure, scalable, and decentralized blockchain ecosystems. The proposed system surpasses top-tier consensuses by attaining a throughput of 1700+ transactions per second, ensuring robust security against all well-known blockchain attacks without compromising scalability and demonstrating solid decentralization in benchmark analysis alongside 25 other blockchain systems, all achieved with an affordable hardware cost for validators and an average CPU usage of only 16.1%.
区块链技术领域一直面临的挑战是如何解决去中心化、安全性和可扩展性之间的三难问题。本文介绍了一种开创性的区块链架构,旨在将先进的加密方法、创新的安全协议和动态去中心化机制结合在一起,从而超越这一三难问题。我们的框架采用了椭圆曲线密码学、施诺尔可验证随机函数和零知识证明(zk-SNARK)等成熟技术,以及股权分配、异常检测和激励调整等开创性方法,为安全、可扩展和去中心化的区块链生态系统树立了新标杆。拟议的系统超越了顶级共识,每秒吞吐量达到 1700 多笔交易,在不影响可扩展性的情况下确保了抵御所有知名区块链攻击的强大安全性,并在基准分析中与其他 25 个区块链系统一起展示了稳固的去中心化,所有这一切都以验证者可承受的硬件成本和平均仅 16.1% 的 CPU 使用率实现的。
{"title":"Breaking the Blockchain Trilemma: A Comprehensive Consensus Mechanism for Ensuring Security, Scalability, and Decentralization","authors":"Khandakar Md Shafin, Saha Reno","doi":"10.1049/2024/6874055","DOIUrl":"10.1049/2024/6874055","url":null,"abstract":"<p>The ongoing challenge in the world of blockchain technology is finding a solution to the trilemma that involves balancing decentralization, security, and scalability. This paper introduces a pioneering blockchain architecture designed to transcend this trilemma, uniting advanced cryptographic methods, inventive security protocols, and dynamic decentralization mechanisms. Employing established techniques such as elliptic curve cryptography, Schnorr verifiable random function, and zero-knowledge proof (zk-SNARK), alongside groundbreaking methodologies for stake distribution, anomaly detection, and incentive alignment, our framework sets a new benchmark for secure, scalable, and decentralized blockchain ecosystems. The proposed system surpasses top-tier consensuses by attaining a throughput of 1700+ transactions per second, ensuring robust security against all well-known blockchain attacks without compromising scalability and demonstrating solid decentralization in benchmark analysis alongside 25 other blockchain systems, all achieved with an affordable hardware cost for validators and an average CPU usage of only 16.1%.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/6874055","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142404701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software defect prediction (SDP) has been a prominent area of research in software engineering. Previous SDP methods often struggled in industrial applications, primarily due to the need for sufficient historical data. Thus, clustering-based unsupervised defect prediction (CUDP) and cross-project defect prediction (CPDP) emerged to address this challenge. However, the former exhibited limitations in capturing semantic and structural features, while the latter encountered constraints due to differences in data distribution across projects. Therefore, we introduce a novel framework called improved clustering with graph-embedding-based features (IC-GraF) for SDP without the reliance on historical data. First, a preprocessing operation is performed to extract program dependence graphs (PDGs) and mark distinct dependency relationships within them. Second, the improved deep graph infomax (IDGI) model, an extension of the DGI model specifically for SDP, is designed to generate graph-level representations of PDGs. Finally, a heuristic-based k-means clustering algorithm is employed to classify the features generated by IDGI. To validate the efficacy of IC-GraF, we conduct experiments based on 24 releases of the PROMISE dataset, using F-measure and G-measure as evaluation criteria. The findings indicate that IC-GraF achieves 5.0%−42.7% higher F-measure, 5%−39.4% higher G-measure, and 2.5%−11.4% higher AUC over existing CUDP methods. Even when compared with eight supervised learning-based SDP methods, IC-GraF maintains a superior competitive edge.
{"title":"IC-GraF: An Improved Clustering with Graph-Embedding-Based Features for Software Defect Prediction","authors":"Xuanye Wang, Lu Lu, Qingyan Tian, Haishan Lin","doi":"10.1049/2024/8027037","DOIUrl":"10.1049/2024/8027037","url":null,"abstract":"<p>Software defect prediction (SDP) has been a prominent area of research in software engineering. Previous SDP methods often struggled in industrial applications, primarily due to the need for sufficient historical data. Thus, clustering-based unsupervised defect prediction (CUDP) and cross-project defect prediction (CPDP) emerged to address this challenge. However, the former exhibited limitations in capturing semantic and structural features, while the latter encountered constraints due to differences in data distribution across projects. Therefore, we introduce a novel framework called improved clustering with graph-embedding-based features (IC-GraF) for SDP without the reliance on historical data. First, a preprocessing operation is performed to extract program dependence graphs (PDGs) and mark distinct dependency relationships within them. Second, the improved deep graph infomax (IDGI) model, an extension of the DGI model specifically for SDP, is designed to generate graph-level representations of PDGs. Finally, a heuristic-based k-means clustering algorithm is employed to classify the features generated by IDGI. To validate the efficacy of IC-GraF, we conduct experiments based on 24 releases of the PROMISE dataset, using F-measure and G-measure as evaluation criteria. The findings indicate that IC-GraF achieves 5.0%−42.7% higher F-measure, 5%−39.4% higher G-measure, and 2.5%−11.4% higher AUC over existing CUDP methods. Even when compared with eight supervised learning-based SDP methods, IC-GraF maintains a superior competitive edge.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/8027037","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142244994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cross-project defect prediction (CPDP) aims to identify defect-prone software instances in one project (target) using historical data collected from other software projects (source), which can help maintainers allocate limited testing resources reasonably. Unfortunately, the feature distribution discrepancy between the source and target projects makes it challenging to transfer the matching feature representation and severely hinders CPDP performance. Besides, existing CPDP models require an intensively expensive and time-consuming process to tune a lot of parameters. To address the above limitations, we propose an effective CPDP model named IAPCP based on distribution adaptation in this study, which consists of two stages: correlation alignment and intra-domain programming. Correlation alignment first calculates the covariance matrices of the source and target projects and then erases some features of the source project (i.e., whitening operation) and employs the features of the target project (i.e., target covariance) to fill the source project, thereby well aligning the source and target feature distributions and reducing the distribution discrepancy across projects. Intra-domain programming can directly learn a nonparametric linear transfer defect predictor with strong discriminative capacity by solving a probabilistic annotation matrix (PAM) based on the adjusted features of the source project. The model does not require model selection and parameter tuning. Extensive experiments on a total of 82 cross-project pairs from 16 software projects demonstrate that IAPCP can achieve competitive CPDP effectiveness and efficiency compared with multiple state-of-the-art baseline models.
{"title":"IAPCP: An Effective Cross-Project Defect Prediction Model via Intra-Domain Alignment and Programming-Based Distribution Adaptation","authors":"Nana Zhang, Kun Zhu, Dandan Zhu","doi":"10.1049/2024/5358773","DOIUrl":"10.1049/2024/5358773","url":null,"abstract":"<p>Cross-project defect prediction (CPDP) aims to identify defect-prone software instances in one project (target) using historical data collected from other software projects (source), which can help maintainers allocate limited testing resources reasonably. Unfortunately, the feature distribution discrepancy between the source and target projects makes it challenging to transfer the matching feature representation and severely hinders CPDP performance. Besides, existing CPDP models require an intensively expensive and time-consuming process to tune a lot of parameters. To address the above limitations, we propose an effective CPDP model named IAPCP based on distribution adaptation in this study, which consists of two stages: correlation alignment and intra-domain programming. Correlation alignment first calculates the covariance matrices of the source and target projects and then erases some features of the source project (i.e., whitening operation) and employs the features of the target project (i.e., target covariance) to fill the source project, thereby well aligning the source and target feature distributions and reducing the distribution discrepancy across projects. Intra-domain programming can directly learn a nonparametric linear transfer defect predictor with strong discriminative capacity by solving a probabilistic annotation matrix (PAM) based on the adjusted features of the source project. The model does not require model selection and parameter tuning. Extensive experiments on a total of 82 cross-project pairs from 16 software projects demonstrate that IAPCP can achieve competitive CPDP effectiveness and efficiency compared with multiple state-of-the-art baseline models.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/5358773","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiayun Zhang, Qingyuan Gong, Yang Chen, Yu Xiao, Xin Wang, Aaron Yi Ding
The temporal patterns of code submissions, denoted as work rhythms, provide valuable insight into the work habits and productivity in software development. In this paper, we investigate the work rhythms in software development and their effects on technical performance by analyzing the profiles of developers and projects from 110 international organizations and their commit activities on GitHub. Using clustering, we identify four work rhythms among individual developers and three work rhythms among software projects. Strong correlations are found between work rhythms and work regions, seniority, and collaboration roles. We then define practical measures for technical performance and examine the effects of different work rhythms on them. Our findings suggest that moderate overtime is related to good technical performance, whereas fixed office hours are associated with receiving less attention. Furthermore, we survey 92 developers to understand their experience with working overtime and the reasons behind it. The survey reveals that developers often work longer than required. A positive attitude towards extended working hours is associated with situations that require addressing unexpected issues or when clear incentives are provided. In addition to the insights from our quantitative and qualitative studies, this work sheds light on tangible measures for both software companies and individual developers to improve the recruitment process, project planning, and productivity assessment.
{"title":"Understanding Work Rhythms in Software Development and Their Effects on Technical Performance","authors":"Jiayun Zhang, Qingyuan Gong, Yang Chen, Yu Xiao, Xin Wang, Aaron Yi Ding","doi":"10.1049/2024/8846233","DOIUrl":"10.1049/2024/8846233","url":null,"abstract":"<p>The temporal patterns of code submissions, denoted as work rhythms, provide valuable insight into the work habits and productivity in software development. In this paper, we investigate the work rhythms in software development and their effects on technical performance by analyzing the profiles of developers and projects from 110 international organizations and their commit activities on GitHub. Using clustering, we identify four work rhythms among individual developers and three work rhythms among software projects. Strong correlations are found between work rhythms and work regions, seniority, and collaboration roles. We then define practical measures for technical performance and examine the effects of different work rhythms on them. Our findings suggest that moderate overtime is related to good technical performance, whereas fixed office hours are associated with receiving less attention. Furthermore, we survey 92 developers to understand their experience with working overtime and the reasons behind it. The survey reveals that developers often work longer than required. A positive attitude towards extended working hours is associated with situations that require addressing unexpected issues or when clear incentives are provided. In addition to the insights from our quantitative and qualitative studies, this work sheds light on tangible measures for both software companies and individual developers to improve the recruitment process, project planning, and productivity assessment.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/8846233","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142100088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper tackles current challenges in network security analysis by proposing an innovative information gain-based feature selection algorithm and leveraging visualization techniques to develop a network security log data visualization system. The system’s key functions include raw data collection for firewall logs and intrusion detection logs, data preprocessing, database management, data manipulation, data logic processing, and data visualization. Through statistical analysis of log data and the construction of visualization models, the system presents analysis results in diverse graphical formats while offering interactive capabilities. Seamlessly integrating data generation, processing, analysis, and display processes, the system demonstrates high accuracy, precision, recall, F1 score, and real-time performance metrics, reaching 98.3%, 92.1%, 97.5%, 98.1%, and 91.2%, respectively, in experimental evaluations. The proposed method significantly enhances real-time prediction capabilities of network security status and monitoring efficiency of network devices, providing a robust security assurance tool.
{"title":"Research and Application of Firewall Log and Intrusion Detection Log Data Visualization System","authors":"Ma Mingze","doi":"10.1049/2024/7060298","DOIUrl":"10.1049/2024/7060298","url":null,"abstract":"<p>This paper tackles current challenges in network security analysis by proposing an innovative information gain-based feature selection algorithm and leveraging visualization techniques to develop a network security log data visualization system. The system’s key functions include raw data collection for firewall logs and intrusion detection logs, data preprocessing, database management, data manipulation, data logic processing, and data visualization. Through statistical analysis of log data and the construction of visualization models, the system presents analysis results in diverse graphical formats while offering interactive capabilities. Seamlessly integrating data generation, processing, analysis, and display processes, the system demonstrates high accuracy, precision, recall, F1 score, and real-time performance metrics, reaching 98.3%, 92.1%, 97.5%, 98.1%, and 91.2%, respectively, in experimental evaluations. The proposed method significantly enhances real-time prediction capabilities of network security status and monitoring efficiency of network devices, providing a robust security assurance tool.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/7060298","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141973646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Long-term time series forecasting has received significant attention from researchers in recent years. Transformer model-based approaches have emerged as promising solutions in this domain. Nevertheless, most existing methods rely on point-by-point self-attention mechanisms or employ transformations, decompositions, and reconstructions of the entire sequence to capture dependencies. The point-by-point self-attention mechanism becomes impractical for long-term time series forecasting due to its quadratic complexity with respect to the time series length. Decomposition and reconstruction methods may introduce information loss, leading to performance bottlenecks in the models. In this paper, we propose a Transformer-based forecasting model called NPformer. Our method introduces a novel multiscale segmented Fourier attention mechanism. By segmenting the long-term time series and performing discrete Fourier transforms on different segments, we aim to identify frequency-domain correlations between these segments. This allows us to capture dependencies more effectively. In addition, we incorporate a normalization module and a desmoothing factor into the model. These components address the problem of oversmoothing that arises in sequence decomposition methods. Furthermore, we introduce an isometry convolution method to enhance the prediction accuracy of the model. The experimental results demonstrate that NPformer outperforms other Transformer-based methods in long-term time series forecasting.
{"title":"Segmented Frequency-Domain Correlation Prediction Model for Long-Term Time Series Forecasting Using Transformer","authors":"Haozhuo Tong, Lingyun Kong, Jie Liu, Shiyan Gao, Yilu Xu, Yuezhe Chen","doi":"10.1049/2024/2920167","DOIUrl":"10.1049/2024/2920167","url":null,"abstract":"<p>Long-term time series forecasting has received significant attention from researchers in recent years. Transformer model-based approaches have emerged as promising solutions in this domain. Nevertheless, most existing methods rely on point-by-point self-attention mechanisms or employ transformations, decompositions, and reconstructions of the entire sequence to capture dependencies. The point-by-point self-attention mechanism becomes impractical for long-term time series forecasting due to its quadratic complexity with respect to the time series length. Decomposition and reconstruction methods may introduce information loss, leading to performance bottlenecks in the models. In this paper, we propose a Transformer-based forecasting model called NPformer. Our method introduces a novel multiscale segmented Fourier attention mechanism. By segmenting the long-term time series and performing discrete Fourier transforms on different segments, we aim to identify frequency-domain correlations between these segments. This allows us to capture dependencies more effectively. In addition, we incorporate a normalization module and a desmoothing factor into the model. These components address the problem of oversmoothing that arises in sequence decomposition methods. Furthermore, we introduce an isometry convolution method to enhance the prediction accuracy of the model. The experimental results demonstrate that NPformer outperforms other Transformer-based methods in long-term time series forecasting.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/2920167","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accounting management and production optimization are vital aspects of enterprise management, serving as indispensable core components in the modern business landscape. However, conventional methods reliant on manual input exhibit drawbacks such as low recognition accuracy and excessive memory consumption. To address these challenges, semantic recognition technology utilizing voice signals has emerged as a pivotal solution across various industries. Building upon this premise, this paper introduces a distributed semantic recognition-based algorithm for accounting management and production optimization. The proposed algorithm encompasses multiple modules, including a front-end feature extraction module, a channel transmission module, and a voice quality vector quantization module. Additionally, a semantic recognition module is introduced to process the voice signals and generate prediction results. By leveraging extensive accounting management and production data for learning and analysis, the algorithm automatically uncovers patterns and laws within the data, extracting valuable information. To validate the proposed algorithm, this study utilizes the dataset from the UCI machine learning repository and applies it for analysis and processing. The experimental findings demonstrate that the algorithm introduced in this paper outperforms alternative methods. Specifically, it achieves a notable 9.3% improvement in comprehensive recognition accuracy and reduces memory usage by 34.4%. These results highlight the algorithm’s efficacy in enhancing the understanding and analysis of customer needs, market trends, competitors, and other pertinent information within the realm of commercial applications for companies.
{"title":"Accounting Management and Optimizing Production Based on Distributed Semantic Recognition","authors":"Ruina Guo, Shu Wang, Guangsen Wei","doi":"10.1049/2024/8425877","DOIUrl":"10.1049/2024/8425877","url":null,"abstract":"<p>Accounting management and production optimization are vital aspects of enterprise management, serving as indispensable core components in the modern business landscape. However, conventional methods reliant on manual input exhibit drawbacks such as low recognition accuracy and excessive memory consumption. To address these challenges, semantic recognition technology utilizing voice signals has emerged as a pivotal solution across various industries. Building upon this premise, this paper introduces a distributed semantic recognition-based algorithm for accounting management and production optimization. The proposed algorithm encompasses multiple modules, including a front-end feature extraction module, a channel transmission module, and a voice quality vector quantization module. Additionally, a semantic recognition module is introduced to process the voice signals and generate prediction results. By leveraging extensive accounting management and production data for learning and analysis, the algorithm automatically uncovers patterns and laws within the data, extracting valuable information. To validate the proposed algorithm, this study utilizes the dataset from the UCI machine learning repository and applies it for analysis and processing. The experimental findings demonstrate that the algorithm introduced in this paper outperforms alternative methods. Specifically, it achieves a notable 9.3% improvement in comprehensive recognition accuracy and reduces memory usage by 34.4%. These results highlight the algorithm’s efficacy in enhancing the understanding and analysis of customer needs, market trends, competitors, and other pertinent information within the realm of commercial applications for companies.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/8425877","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141424921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Distributed global snapshot (DGS) is one of the fundamental protocols in distributed systems. It is used for different applications like collecting information from a distributed system and taking checkpoints for process rollback. The Chandy–Lamport protocol (CLP) is famous and well-known for taking DGS. The main aim of this protocol was to generate consistent cuts without interrupting the regular operation of the distributed system. CLP was the origin of many future protocols and inspired them. The first aim of this paper is to propose a novel formal hierarchical parametric colored Petri net model of CLP. The number of constituting processes of the model is parametric. The second aim is to automatically generate a novel message sequence chart (MSC) to show detailed steps for each simulation run of the snapshot protocol. The third aim is model checking of the proposed formal model to verify the correctness of CLP and our proposed colored Petri net model. Having vital tools helps greatly to test the correct operation of the newly proposed distributed snapshot protocol. The proposed model of CLP can easily be used for visually testing the correct operation of the new future under-development DGS protocol. It also permits formal verification of the correct operation of the new proposed protocol. This model can be used as a simple, powerful, and visual tool for the step-by-step run of the CLP, model checking, and teaching it to postgraduate students. The same approach applies to similar complicated distributed protocols.
分布式全局快照(DGS)是分布式系统的基本协议之一。它用于不同的应用,如从分布式系统中收集信息,以及为进程回滚获取检查点。Chandy-Lamport 协议(CLP)是著名的 DGS 协议。该协议的主要目的是在不中断分布式系统正常运行的情况下生成一致的切点。CLP 是许多未来协议的起源和灵感来源。本文的第一个目的是提出一种新颖的 CLP 形式分层参数化彩色 Petri 网模型。该模型的构成进程数是参数化的。第二个目的是自动生成新颖的消息序列图(MSC),以显示快照协议每次模拟运行的详细步骤。第三个目的是对提出的形式模型进行模型检查,以验证 CLP 和我们提出的彩色 Petri 网模型的正确性。拥有重要的工具对测试新提出的分布式快照协议的正确运行大有帮助。拟议的 CLP 模型可轻松用于直观测试未来正在开发的新 DGS 协议的正确运行。它还允许对新提议协议的正确操作进行正式验证。这个模型可以作为一个简单、强大和可视化的工具,用于逐步运行 CLP、进行模型检查和教授研究生。同样的方法也适用于类似的复杂分布式协议。
{"title":"Modeling Chandy–Lamport Distributed Snapshot Algorithm Using Colored Petri Net","authors":"Saeid Pashazadeh, Basheer Zuhair Jaafar Al-Basseer, Jafar Tanha","doi":"10.1049/2024/6582682","DOIUrl":"10.1049/2024/6582682","url":null,"abstract":"<p>Distributed global snapshot (DGS) is one of the fundamental protocols in distributed systems. It is used for different applications like collecting information from a distributed system and taking checkpoints for process rollback. The Chandy–Lamport protocol (CLP) is famous and well-known for taking DGS. The main aim of this protocol was to generate consistent cuts without interrupting the regular operation of the distributed system. CLP was the origin of many future protocols and inspired them. The first aim of this paper is to propose a novel formal hierarchical parametric colored Petri net model of CLP. The number of constituting processes of the model is parametric. The second aim is to automatically generate a novel message sequence chart (MSC) to show detailed steps for each simulation run of the snapshot protocol. The third aim is model checking of the proposed formal model to verify the correctness of CLP and our proposed colored Petri net model. Having vital tools helps greatly to test the correct operation of the newly proposed distributed snapshot protocol. The proposed model of CLP can easily be used for visually testing the correct operation of the new future under-development DGS protocol. It also permits formal verification of the correct operation of the new proposed protocol. This model can be used as a simple, powerful, and visual tool for the step-by-step run of the CLP, model checking, and teaching it to postgraduate students. The same approach applies to similar complicated distributed protocols.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/6582682","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141286897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qinhe Zhang, Jiachen Zhang, Tie Feng, Jialang Xue, Xinxin Zhu, Ningyang Zhu, Zhiheng Li
Machine learning-based software defect prediction (SDP) approaches have been commonly proposed to help to deliver high-quality software. Unfortunately, all the previous research conducted without effective feature reduction suffers from high-dimensional data, leading to unsatisfactory prediction performance measures. Moreover, without proper feature reduction, the interpretability and generalization ability of machine learning models in SDP may be compromised, hindering their practical utility in diverse software development environments. In this paper, an SDP approach using deep Q-learning network (DQN)-based feature extraction is proposed to eliminate irrelevant, redundant, and noisy features and improve the classification performance. In the data preprocessing phase, the undersampling method of BalanceCascade is applied to divide the original datasets. As the first step of feature extraction, the weight ranking of all the metric elements is calculated according to the expected cross-entropy. Then, the relation matrix is constructed by applying random matrix theory. After that, the reward principle is defined for computing the Q value of Q-learning based on weight ranking, relation matrix, and the number of errors, according to which a convolutional neural network model is trained on datasets until the sequences of metric pairs are generated for all datasets acting as the revised feature set. Various experiments have been conducted on 11 NASA and 11 PROMISE repository datasets. Sensitive analysis experiments show that binary classification algorithms based on SDP approaches using the DQN-based feature extraction outperform those without using it. We also conducted experiments to compare our approach with four state-of-the-art approaches on common datasets, which show that our approach is superior to these methods in precision, F-measure, area under receiver operating characteristics curve, and Matthews correlation coefficient values.
{"title":"Software Defect Prediction Using Deep Q-Learning Network-Based Feature Extraction","authors":"Qinhe Zhang, Jiachen Zhang, Tie Feng, Jialang Xue, Xinxin Zhu, Ningyang Zhu, Zhiheng Li","doi":"10.1049/2024/3946655","DOIUrl":"10.1049/2024/3946655","url":null,"abstract":"<p>Machine learning-based software defect prediction (SDP) approaches have been commonly proposed to help to deliver high-quality software. Unfortunately, all the previous research conducted without effective feature reduction suffers from high-dimensional data, leading to unsatisfactory prediction performance measures. Moreover, without proper feature reduction, the interpretability and generalization ability of machine learning models in SDP may be compromised, hindering their practical utility in diverse software development environments. In this paper, an SDP approach using deep <i>Q</i>-learning network (DQN)-based feature extraction is proposed to eliminate irrelevant, redundant, and noisy features and improve the classification performance. In the data preprocessing phase, the undersampling method of BalanceCascade is applied to divide the original datasets. As the first step of feature extraction, the weight ranking of all the metric elements is calculated according to the expected cross-entropy. Then, the relation matrix is constructed by applying random matrix theory. After that, the reward principle is defined for computing the <i>Q</i> value of <i>Q</i>-learning based on weight ranking, relation matrix, and the number of errors, according to which a convolutional neural network model is trained on datasets until the sequences of metric pairs are generated for all datasets acting as the revised feature set. Various experiments have been conducted on 11 NASA and 11 PROMISE repository datasets. Sensitive analysis experiments show that binary classification algorithms based on SDP approaches using the DQN-based feature extraction outperform those without using it. We also conducted experiments to compare our approach with four state-of-the-art approaches on common datasets, which show that our approach is superior to these methods in precision, <i>F</i>-measure, area under receiver operating characteristics curve, and Matthews correlation coefficient values.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/3946655","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141246131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siyu Jiang, Jiapeng Zhang, Feng Guo, Teng Ouyang, Jing Li
Cross-project defect prediction (CPDP) is an attractive research area in software testing. It identifies defects in projects with limited labeled data (target projects) by utilizing predictive models from data-rich projects (source projects). Existing CPDP methods based on transfer learning mainly rely on the assumption of a unimodal distribution and consider the case where the feature distribution has one obvious peak. However, in actual situations, the feature distribution of project samples often exhibits multiple peaks that cannot be ignored. It manifests as a multimodal distribution, making it challenging to align distributions between different projects. To address this issue, we propose a balanced adversarial tight-matching model for CPDP. Specifically, this method employs multilinear conditioning to obtain the cross-covariance of both features and classifier predictions, capturing the multimodal distribution of the feature. When reducing the captured multimodal distribution differences, pseudo-labels are needed, but pseudo-labels have uncertainty. Therefore, we additionally add an auxiliary classifier and attempt to generate pseudo-labels using a pseudo-label strategy with less uncertainty. Finally, the feature generator and two classifiers undergo adversarial training to align the multimodal distributions of different projects. This method outperforms the state-of-the-art CPDP model used on the benchmark dataset.
{"title":"Balanced Adversarial Tight Matching for Cross-Project Defect Prediction","authors":"Siyu Jiang, Jiapeng Zhang, Feng Guo, Teng Ouyang, Jing Li","doi":"10.1049/2024/1561351","DOIUrl":"10.1049/2024/1561351","url":null,"abstract":"<p>Cross-project defect prediction (CPDP) is an attractive research area in software testing. It identifies defects in projects with limited labeled data (target projects) by utilizing predictive models from data-rich projects (source projects). Existing CPDP methods based on transfer learning mainly rely on the assumption of a unimodal distribution and consider the case where the feature distribution has one obvious peak. However, in actual situations, the feature distribution of project samples often exhibits multiple peaks that cannot be ignored. It manifests as a multimodal distribution, making it challenging to align distributions between different projects. To address this issue, we propose a balanced adversarial tight-matching model for CPDP. Specifically, this method employs multilinear conditioning to obtain the cross-covariance of both features and classifier predictions, capturing the multimodal distribution of the feature. When reducing the captured multimodal distribution differences, pseudo-labels are needed, but pseudo-labels have uncertainty. Therefore, we additionally add an auxiliary classifier and attempt to generate pseudo-labels using a pseudo-label strategy with less uncertainty. Finally, the feature generator and two classifiers undergo adversarial training to align the multimodal distributions of different projects. This method outperforms the state-of-the-art CPDP model used on the benchmark dataset.</p>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/1561351","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140968219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}