首页 > 最新文献

IET Software最新文献

英文 中文
Software Defect Prediction Using Deep Q-Learning Network-Based Feature Extraction 利用基于深度 Q 学习网络的特征提取进行软件缺陷预测
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-30 DOI: 10.1049/2024/3946655
Qinhe Zhang, Jiachen Zhang, Tie Feng, Jialang Xue, Xinxin Zhu, Ningyang Zhu, Zhiheng Li

Machine learning-based software defect prediction (SDP) approaches have been commonly proposed to help to deliver high-quality software. Unfortunately, all the previous research conducted without effective feature reduction suffers from high-dimensional data, leading to unsatisfactory prediction performance measures. Moreover, without proper feature reduction, the interpretability and generalization ability of machine learning models in SDP may be compromised, hindering their practical utility in diverse software development environments. In this paper, an SDP approach using deep Q-learning network (DQN)-based feature extraction is proposed to eliminate irrelevant, redundant, and noisy features and improve the classification performance. In the data preprocessing phase, the undersampling method of BalanceCascade is applied to divide the original datasets. As the first step of feature extraction, the weight ranking of all the metric elements is calculated according to the expected cross-entropy. Then, the relation matrix is constructed by applying random matrix theory. After that, the reward principle is defined for computing the Q value of Q-learning based on weight ranking, relation matrix, and the number of errors, according to which a convolutional neural network model is trained on datasets until the sequences of metric pairs are generated for all datasets acting as the revised feature set. Various experiments have been conducted on 11 NASA and 11 PROMISE repository datasets. Sensitive analysis experiments show that binary classification algorithms based on SDP approaches using the DQN-based feature extraction outperform those without using it. We also conducted experiments to compare our approach with four state-of-the-art approaches on common datasets, which show that our approach is superior to these methods in precision, F-measure, area under receiver operating characteristics curve, and Matthews correlation coefficient values.

人们普遍提出了基于机器学习的软件缺陷预测(SDP)方法,以帮助交付高质量的软件。遗憾的是,以往的所有研究都没有对特征进行有效的缩减,因而受到高维数据的困扰,导致预测性能指标不尽人意。此外,如果没有适当的特征缩减,SDP 中机器学习模型的可解释性和泛化能力可能会受到影响,从而阻碍其在各种软件开发环境中的实际应用。本文提出了一种基于深度 Q 学习网络(DQN)特征提取的 SDP 方法,以消除无关、冗余和噪声特征,提高分类性能。在数据预处理阶段,采用 BalanceCascade 的欠采样方法对原始数据集进行划分。作为特征提取的第一步,根据预期交叉熵计算所有度量元素的权重排序。然后,运用随机矩阵理论构建关系矩阵。然后,根据权重排序、关系矩阵和错误数定义奖励原则,计算 Q-learning 的 Q 值,并根据该原则在数据集上训练卷积神经网络模型,直到生成所有数据集的度量对序列作为修正特征集。在 11 个 NASA 和 11 个 PROMISE 数据库数据集上进行了各种实验。敏感性分析实验表明,使用基于 DQN 的特征提取的基于 SDP 方法的二元分类算法优于未使用 DQN 的算法。我们还进行了实验,在常见数据集上将我们的方法与四种最先进的方法进行了比较,结果表明我们的方法在精确度、F-measure、接收者操作特性曲线下面积和马修斯相关系数值方面都优于这些方法。
{"title":"Software Defect Prediction Using Deep Q-Learning Network-Based Feature Extraction","authors":"Qinhe Zhang,&nbsp;Jiachen Zhang,&nbsp;Tie Feng,&nbsp;Jialang Xue,&nbsp;Xinxin Zhu,&nbsp;Ningyang Zhu,&nbsp;Zhiheng Li","doi":"10.1049/2024/3946655","DOIUrl":"https://doi.org/10.1049/2024/3946655","url":null,"abstract":"<div>\u0000 <p>Machine learning-based software defect prediction (SDP) approaches have been commonly proposed to help to deliver high-quality software. Unfortunately, all the previous research conducted without effective feature reduction suffers from high-dimensional data, leading to unsatisfactory prediction performance measures. Moreover, without proper feature reduction, the interpretability and generalization ability of machine learning models in SDP may be compromised, hindering their practical utility in diverse software development environments. In this paper, an SDP approach using deep <i>Q</i>-learning network (DQN)-based feature extraction is proposed to eliminate irrelevant, redundant, and noisy features and improve the classification performance. In the data preprocessing phase, the undersampling method of BalanceCascade is applied to divide the original datasets. As the first step of feature extraction, the weight ranking of all the metric elements is calculated according to the expected cross-entropy. Then, the relation matrix is constructed by applying random matrix theory. After that, the reward principle is defined for computing the <i>Q</i> value of <i>Q</i>-learning based on weight ranking, relation matrix, and the number of errors, according to which a convolutional neural network model is trained on datasets until the sequences of metric pairs are generated for all datasets acting as the revised feature set. Various experiments have been conducted on 11 NASA and 11 PROMISE repository datasets. Sensitive analysis experiments show that binary classification algorithms based on SDP approaches using the DQN-based feature extraction outperform those without using it. We also conducted experiments to compare our approach with four state-of-the-art approaches on common datasets, which show that our approach is superior to these methods in precision, <i>F</i>-measure, area under receiver operating characteristics curve, and Matthews correlation coefficient values.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/3946655","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141246131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balanced Adversarial Tight Matching for Cross-Project Defect Prediction 用于跨项目缺陷预测的平衡对抗式紧密匹配
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-16 DOI: 10.1049/2024/1561351
Siyu Jiang, Jiapeng Zhang, Feng Guo, Teng Ouyang, Jing Li

Cross-project defect prediction (CPDP) is an attractive research area in software testing. It identifies defects in projects with limited labeled data (target projects) by utilizing predictive models from data-rich projects (source projects). Existing CPDP methods based on transfer learning mainly rely on the assumption of a unimodal distribution and consider the case where the feature distribution has one obvious peak. However, in actual situations, the feature distribution of project samples often exhibits multiple peaks that cannot be ignored. It manifests as a multimodal distribution, making it challenging to align distributions between different projects. To address this issue, we propose a balanced adversarial tight-matching model for CPDP. Specifically, this method employs multilinear conditioning to obtain the cross-covariance of both features and classifier predictions, capturing the multimodal distribution of the feature. When reducing the captured multimodal distribution differences, pseudo-labels are needed, but pseudo-labels have uncertainty. Therefore, we additionally add an auxiliary classifier and attempt to generate pseudo-labels using a pseudo-label strategy with less uncertainty. Finally, the feature generator and two classifiers undergo adversarial training to align the multimodal distributions of different projects. This method outperforms the state-of-the-art CPDP model used on the benchmark dataset.

跨项目缺陷预测(CPDP)是软件测试中一个极具吸引力的研究领域。它通过利用数据丰富的项目(源项目)中的预测模型,识别标注数据有限的项目(目标项目)中的缺陷。现有的基于迁移学习的 CPDP 方法主要依赖于单模态分布假设,并考虑特征分布有一个明显峰值的情况。然而,在实际情况中,项目样本的特征分布往往会出现多个峰值,这一点不容忽视。它表现为一种多模态分布,使得不同项目之间的分布对齐具有挑战性。为解决这一问题,我们提出了一种用于 CPDP 的平衡对抗紧密匹配模型。具体来说,该方法采用多线性调节来获得特征和分类器预测的交叉协方差,从而捕捉特征的多模态分布。在减少捕捉到的多模态分布差异时,需要伪标签,但伪标签具有不确定性。因此,我们额外添加了一个辅助分类器,并尝试使用不确定性较小的伪标签策略生成伪标签。最后,对特征生成器和两个分类器进行对抗训练,以调整不同项目的多模态分布。这种方法优于基准数据集上使用的最先进的 CPDP 模型。
{"title":"Balanced Adversarial Tight Matching for Cross-Project Defect Prediction","authors":"Siyu Jiang,&nbsp;Jiapeng Zhang,&nbsp;Feng Guo,&nbsp;Teng Ouyang,&nbsp;Jing Li","doi":"10.1049/2024/1561351","DOIUrl":"10.1049/2024/1561351","url":null,"abstract":"<div>\u0000 <p>Cross-project defect prediction (CPDP) is an attractive research area in software testing. It identifies defects in projects with limited labeled data (target projects) by utilizing predictive models from data-rich projects (source projects). Existing CPDP methods based on transfer learning mainly rely on the assumption of a unimodal distribution and consider the case where the feature distribution has one obvious peak. However, in actual situations, the feature distribution of project samples often exhibits multiple peaks that cannot be ignored. It manifests as a multimodal distribution, making it challenging to align distributions between different projects. To address this issue, we propose a balanced adversarial tight-matching model for CPDP. Specifically, this method employs multilinear conditioning to obtain the cross-covariance of both features and classifier predictions, capturing the multimodal distribution of the feature. When reducing the captured multimodal distribution differences, pseudo-labels are needed, but pseudo-labels have uncertainty. Therefore, we additionally add an auxiliary classifier and attempt to generate pseudo-labels using a pseudo-label strategy with less uncertainty. Finally, the feature generator and two classifiers undergo adversarial training to align the multimodal distributions of different projects. This method outperforms the state-of-the-art CPDP model used on the benchmark dataset.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/1561351","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140968219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Empirical Study on Downstream Dependency Package Groups in Software Packaging Ecosystems 软件包生态系统中的下游依赖包群实证研究
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-04-30 DOI: 10.1049/2024/4488412
Qing Qi, Jian Cao

The role of focal packages in packaging ecosystems is crucial for the development of the entire ecosystem, as they are the packages on which other packages depend. However, the evolution of dependency groups in packaging ecosystems has not been systematically investigated. In this study, we examine the downstream dependency package groups (DDGs) in three typical packaging ecosystems—Cargo for Rust, Comprehensive Perl Archive Network for Perl, and RubyGems for Ruby—to identify their features and evolution. We also identify and analyze a special type of DDG, the collaborative downstream dependency package group (CDDG), which requires shared contributors. Our findings show that the overall development of DDGs, particularly CDDGs, is consistent with the status of the whole ecosystem, and the size of DDGs and CDDGs follows a power law distribution. Furthermore, the interaction mechanisms between focal packages and downstream packages differ between ecosystems, but focal packages always play a leading role in the development of DDGs and CDDGs. Finally, we investigate predictive models for the development of CDDGs in the next stage based on their features, and our results show that random forest and Gradient Boosting Regression Tree achieve acceptable prediction accuracy. We provide the raw data and scripts used for our analysis at https://github.com/onion616/DDG.

重点包装在包装生态系统中的作用对整个生态系统的发展至关重要,因为它们是其他包装所依赖的包装。然而,包装生态系统中依赖包群的演变尚未得到系统研究。在本研究中,我们研究了三个典型打包生态系统中的下游依赖包组(DDGs)--Rust 的 Cargo、Perl 的 Comprehensive Perl Archive Network 和 Ruby 的 RubyGems,以确定它们的特征和演变。我们还识别并分析了一种特殊类型的 DDG,即协作式下游依赖包组(CDDG),它需要共享贡献者。我们的研究结果表明,DDGs(尤其是 CDDGs)的整体发展与整个生态系统的状况是一致的,DDGs 和 CDDGs 的规模遵循幂律分布。此外,不同生态系统中焦点包与下游包之间的相互作用机制也不尽相同,但焦点包在 DDGs 和 CDDGs 的发展中始终起着主导作用。最后,我们根据 CDDGs 的特征研究了下一阶段 CDDGs 发展的预测模型,结果表明随机森林和梯度提升回归树达到了可接受的预测精度。我们在 https://github.com/onion616/DDG 网站上提供了用于分析的原始数据和脚本。
{"title":"An Empirical Study on Downstream Dependency Package Groups in Software Packaging Ecosystems","authors":"Qing Qi,&nbsp;Jian Cao","doi":"10.1049/2024/4488412","DOIUrl":"https://doi.org/10.1049/2024/4488412","url":null,"abstract":"<div>\u0000 <p>The role of focal packages in packaging ecosystems is crucial for the development of the entire ecosystem, as they are the packages on which other packages depend. However, the evolution of dependency groups in packaging ecosystems has not been systematically investigated. In this study, we examine the downstream dependency package groups (DDGs) in three typical packaging ecosystems—Cargo for Rust, Comprehensive Perl Archive Network for Perl, and RubyGems for Ruby—to identify their features and evolution. We also identify and analyze a special type of DDG, the collaborative downstream dependency package group (CDDG), which requires shared contributors. Our findings show that the overall development of DDGs, particularly CDDGs, is consistent with the status of the whole ecosystem, and the size of DDGs and CDDGs follows a power law distribution. Furthermore, the interaction mechanisms between focal packages and downstream packages differ between ecosystems, but focal packages always play a leading role in the development of DDGs and CDDGs. Finally, we investigate predictive models for the development of CDDGs in the next stage based on their features, and our results show that random forest and Gradient Boosting Regression Tree achieve acceptable prediction accuracy. We provide the raw data and scripts used for our analysis at https://github.com/onion616/DDG.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/4488412","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141096478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting DBSCAN and Combination Strategy to Prioritize the Test Suite in Regression Testing 利用 DBSCAN 和组合策略确定回归测试中测试套件的优先级
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-04-04 DOI: 10.1049/2024/9942959
Zikang Zhang, Jinfu Chen, Yuechao Gu, Zhehao Li, Rexford Nii Ayitey Sosu

Test case prioritization techniques improve the fault detection rate by adjusting the execution sequence of test cases. For static black-box test case prioritization techniques, existing methods generally improve the fault detection rate by increasing the early diversity of execution sequences based on string distance differences. However, such methods have a high time overhead and are less stable. This paper proposes a novel test case prioritization method (DC-TCP) based on density-based spatial clustering of applications with noise (DBSCAN) and combination policies. By introducing a combination strategy to model the inputs to generate a mapping model, the test inputs are mapped to consistent types to improve generality. The DBSCAN method is then used to refine the classification of test cases further, and finally, the Firefly search strategy is introduced to improve the effectiveness of sequence merging. Extensive experimental results demonstrate that the proposed DC-TCP method outperforms other methods in terms of the average percentage of faults detected and exhibits advantages in terms of time efficiency when compared to several existing static black-box sorting methods.

测试用例优先级排序技术通过调整测试用例的执行顺序来提高故障检测率。对于静态黑盒测试用例优先级排序技术,现有的方法一般是根据字符串距离差异来提高执行序列的早期多样性,从而提高故障检测率。然而,这类方法的时间开销较大,稳定性较差。本文提出了一种新的测试用例优先级排序方法(DC-TCP),它基于带噪声应用的密度空间聚类(DBSCAN)和组合策略。通过引入组合策略对输入进行建模以生成映射模型,将测试输入映射到一致的类型以提高通用性。然后使用 DBSCAN 方法进一步完善测试用例的分类,最后引入萤火虫搜索策略来提高序列合并的有效性。广泛的实验结果表明,与现有的几种静态黑盒分类方法相比,所提出的 DC-TCP 方法在检测到的故障平均百分比方面优于其他方法,并在时间效率方面表现出优势。
{"title":"Exploiting DBSCAN and Combination Strategy to Prioritize the Test Suite in Regression Testing","authors":"Zikang Zhang,&nbsp;Jinfu Chen,&nbsp;Yuechao Gu,&nbsp;Zhehao Li,&nbsp;Rexford Nii Ayitey Sosu","doi":"10.1049/2024/9942959","DOIUrl":"https://doi.org/10.1049/2024/9942959","url":null,"abstract":"<div>\u0000 <p>Test case prioritization techniques improve the fault detection rate by adjusting the execution sequence of test cases. For static black-box test case prioritization techniques, existing methods generally improve the fault detection rate by increasing the early diversity of execution sequences based on string distance differences. However, such methods have a high time overhead and are less stable. This paper proposes a novel test case prioritization method (DC-TCP) based on density-based spatial clustering of applications with noise (DBSCAN) and combination policies. By introducing a combination strategy to model the inputs to generate a mapping model, the test inputs are mapped to consistent types to improve generality. The DBSCAN method is then used to refine the classification of test cases further, and finally, the Firefly search strategy is introduced to improve the effectiveness of sequence merging. Extensive experimental results demonstrate that the proposed DC-TCP method outperforms other methods in terms of the average percentage of faults detected and exhibits advantages in terms of time efficiency when compared to several existing static black-box sorting methods.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/9942959","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141096405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Expository Examination of Temporally Evolving Graph-Based Approaches for the Visual Investigation of Autonomous Driving 基于时序演进图的自动驾驶视觉研究方法的阐述性研究
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-03-20 DOI: 10.1049/2024/5802816
Li Wan, Wenzhi Cheng

With the continuous advancement of autonomous driving technology, visual analysis techniques have emerged as a prominent research topic. The data generated by autonomous driving is large-scale and time-varying, yet more than existing visual analytics methods are required to deal with such complex data effectively. Time-varying diagrams can be used to model and visualize the dynamic relationships in various complex systems and can visually describe the data trends in autonomous driving systems. To this end, this paper introduces a time-varying graph-based method for visual analysis in autonomous driving. The proposed method employs a graph structure to represent the relative positional relationships between the target and obstacle interferences. By incorporating the time dimension, a time-varying graph model is constructed. The method explores the characteristic changes of nodes in the graph at different time instances, establishing feature expressions that differentiate target and obstacle motion patterns. The analysis demonstrates that the feature vector centrality in the time-varying graph effectively captures the distinctions in motion patterns between targets and obstacles. These features can be utilized for accurate target and obstacle recognition, achieving high recognition accuracy. To evaluate the proposed time-varying graph-based visual analytic autopilot method, a comparative study is conducted against traditional visual analytic methods such as the frame differencing method and advanced visual analytic methods like visual lidar odometry and mapping. Robustness, accuracy, and resource consumption experiments are performed using the publicly available KITTI dataset to analyze and compare the three methods. The experimental results show that the proposed time-varying graph-based method exhibits superior accuracy and robustness. This study offers valuable insights and solution ideas for developing deep integration between intelligent networked vehicles and intelligent transportation. It provides a reference for advancing intelligent transportation systems and their integration with autonomous driving technologies.

随着自动驾驶技术的不断进步,可视化分析技术已成为一个突出的研究课题。自动驾驶产生的数据规模大、时变性强,要有效处理这些复杂的数据,需要比现有的可视化分析方法更多的方法。时变图可以用来对各种复杂系统中的动态关系进行建模和可视化,并能直观地描述自动驾驶系统中的数据趋势。为此,本文介绍了一种基于时变图的自动驾驶可视化分析方法。所提出的方法采用图结构来表示目标和障碍物干扰之间的相对位置关系。通过结合时间维度,构建了一个时变图模型。该方法探索了图中节点在不同时间实例下的特征变化,建立了区分目标和障碍物运动模式的特征表达式。分析表明,时变图中的特征向量中心性能有效捕捉目标和障碍物运动模式的区别。这些特征可用于准确识别目标和障碍物,从而实现较高的识别准确率。为了评估所提出的基于时变图的视觉分析自动驾驶方法,我们将其与传统的视觉分析方法(如帧差分法)和先进的视觉分析方法(如视觉激光雷达测距和测绘)进行了比较研究。使用公开的 KITTI 数据集进行了鲁棒性、准确性和资源消耗实验,以分析和比较这三种方法。实验结果表明,所提出的基于时变图的方法具有更高的准确性和鲁棒性。本研究为发展智能网联汽车与智能交通的深度融合提供了有价值的见解和解决思路。它为推进智能交通系统及其与自动驾驶技术的融合提供了参考。
{"title":"An Expository Examination of Temporally Evolving Graph-Based Approaches for the Visual Investigation of Autonomous Driving","authors":"Li Wan,&nbsp;Wenzhi Cheng","doi":"10.1049/2024/5802816","DOIUrl":"10.1049/2024/5802816","url":null,"abstract":"<div>\u0000 <p>With the continuous advancement of autonomous driving technology, visual analysis techniques have emerged as a prominent research topic. The data generated by autonomous driving is large-scale and time-varying, yet more than existing visual analytics methods are required to deal with such complex data effectively. Time-varying diagrams can be used to model and visualize the dynamic relationships in various complex systems and can visually describe the data trends in autonomous driving systems. To this end, this paper introduces a time-varying graph-based method for visual analysis in autonomous driving. The proposed method employs a graph structure to represent the relative positional relationships between the target and obstacle interferences. By incorporating the time dimension, a time-varying graph model is constructed. The method explores the characteristic changes of nodes in the graph at different time instances, establishing feature expressions that differentiate target and obstacle motion patterns. The analysis demonstrates that the feature vector centrality in the time-varying graph effectively captures the distinctions in motion patterns between targets and obstacles. These features can be utilized for accurate target and obstacle recognition, achieving high recognition accuracy. To evaluate the proposed time-varying graph-based visual analytic autopilot method, a comparative study is conducted against traditional visual analytic methods such as the frame differencing method and advanced visual analytic methods like visual lidar odometry and mapping. Robustness, accuracy, and resource consumption experiments are performed using the publicly available KITTI dataset to analyze and compare the three methods. The experimental results show that the proposed time-varying graph-based method exhibits superior accuracy and robustness. This study offers valuable insights and solution ideas for developing deep integration between intelligent networked vehicles and intelligent transportation. It provides a reference for advancing intelligent transportation systems and their integration with autonomous driving technologies.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/5802816","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140225546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Project Defect Prediction Using Transfer Learning with Long Short-Term Memory Networks 利用长短期记忆网络的迁移学习进行跨项目缺陷预测
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-03-18 DOI: 10.1049/2024/5550801
Hongwei Tao, Lianyou Fu, Qiaoling Cao, Xiaoxu Niu, Haoran Chen, Songtao Shang, Yang Xian

With the increasing number of software projects, within-project defect prediction (WPDP) has already been unable to meet the demand, and cross-project defect prediction (CPDP) is playing an increasingly significant role in the area of software engineering. The classic CPDP methods mainly concentrated on applying metric features to predict defects. However, these approaches failed to consider the rich semantic information, which usually contains the relationship between software defects and context. Since traditional methods are unable to exploit this characteristic, their performance is often unsatisfactory. In this paper, a transfer long short-term memory (TLSTM) network model is first proposed. Transfer semantic features are extracted by adding a transfer learning algorithm to the long short-term memory (LSTM) network. Then, the traditional metric features and semantic features are combined for CPDP. First, the abstract syntax trees (AST) are generated based on the source codes. Second, the AST node contents are converted into integer vectors as inputs to the TLSTM model. Then, the semantic features of the program can be extracted by TLSTM. On the other hand, transferable metric features are extracted by transfer component analysis (TCA). Finally, the semantic features and metric features are combined and input into the logical regression (LR) classifier for training. The presented TLSTM model performs better on the f-measure indicator than other machine and deep learning models, according to the outcomes of several open-source projects of the PROMISE repository. The TLSTM model built with a single feature achieves 0.7% and 2.1% improvement on Log4j-1.2 and Xalan-2.7, respectively. When using combined features to train the prediction model, we call this model a transfer long short-term memory for defect prediction (DPTLSTM). DPTLSTM achieves a 2.9% and 5% improvement on Synapse-1.2 and Xerces-1.4.4, respectively. Both prove the superiority of the proposed model on the CPDP task. This is because LSTM capture long-term dependencies in sequence data and extract features that contain source code structure and context information. It can be concluded that: (1) the TLSTM model has the advantage of preserving information, which can better retain the semantic features related to software defects; (2) compared with the CPDP model trained with traditional metric features, the performance of the model can validly enhance by combining semantic features and metric features.

随着软件项目数量的不断增加,项目内缺陷预测(WPDP)已经无法满足需求,跨项目缺陷预测(CPDP)在软件工程领域发挥着越来越重要的作用。经典的 CPDP 方法主要集中在应用度量特征来预测缺陷。然而,这些方法没有考虑丰富的语义信息,这些信息通常包含软件缺陷与上下文之间的关系。由于传统方法无法利用这一特点,其性能往往不尽如人意。本文首先提出了一种转移长短时记忆(TLSTM)网络模型。通过在长短时记忆(LSTM)网络中加入迁移学习算法,提取迁移语义特征。然后,将传统的度量特征和语义特征结合起来用于 CPDP。首先,根据源代码生成抽象语法树(AST)。其次,将 AST 节点内容转换为整数向量,作为 TLSTM 模型的输入。然后,就可以通过 TLSTM 提取程序的语义特征。另一方面,通过转移成分分析(TCA)提取可转移的度量特征。最后,将语义特征和度量特征结合起来,输入逻辑回归(LR)分类器进行训练。根据 PROMISE 存储库中几个开源项目的结果,所介绍的 TLSTM 模型在 f-measure 指标上的表现优于其他机器学习和深度学习模型。使用单一特征构建的 TLSTM 模型在 Log4j-1.2 和 Xalan-2.7 上分别提高了 0.7% 和 2.1%。当使用组合特征来训练预测模型时,我们称这种模型为缺陷预测的转移长短期记忆(DPTLSTM)。DPTLSTM 比 Synapse-1.2 和 Xerces-1.4.4 分别提高了 2.9% 和 5%。两者都证明了所提出的模型在 CPDP 任务中的优越性。这是因为 LSTM 能够捕捉序列数据中的长期依赖关系,并提取包含源代码结构和上下文信息的特征。由此可以得出以下结论(1) TLSTM 模型具有保存信息的优势,能更好地保留与软件缺陷相关的语义特征;(2) 与使用传统度量特征训练的 CPDP 模型相比,结合语义特征和度量特征能有效提高模型的性能。
{"title":"Cross-Project Defect Prediction Using Transfer Learning with Long Short-Term Memory Networks","authors":"Hongwei Tao,&nbsp;Lianyou Fu,&nbsp;Qiaoling Cao,&nbsp;Xiaoxu Niu,&nbsp;Haoran Chen,&nbsp;Songtao Shang,&nbsp;Yang Xian","doi":"10.1049/2024/5550801","DOIUrl":"10.1049/2024/5550801","url":null,"abstract":"<div>\u0000 <p>With the increasing number of software projects, within-project defect prediction (WPDP) has already been unable to meet the demand, and cross-project defect prediction (CPDP) is playing an increasingly significant role in the area of software engineering. The classic CPDP methods mainly concentrated on applying metric features to predict defects. However, these approaches failed to consider the rich semantic information, which usually contains the relationship between software defects and context. Since traditional methods are unable to exploit this characteristic, their performance is often unsatisfactory. In this paper, a transfer long short-term memory (TLSTM) network model is first proposed. Transfer semantic features are extracted by adding a transfer learning algorithm to the long short-term memory (LSTM) network. Then, the traditional metric features and semantic features are combined for CPDP. First, the abstract syntax trees (AST) are generated based on the source codes. Second, the AST node contents are converted into integer vectors as inputs to the TLSTM model. Then, the semantic features of the program can be extracted by TLSTM. On the other hand, transferable metric features are extracted by transfer component analysis (TCA). Finally, the semantic features and metric features are combined and input into the logical regression (LR) classifier for training. The presented TLSTM model performs better on the <i>f</i>-measure indicator than other machine and deep learning models, according to the outcomes of several open-source projects of the PROMISE repository. The TLSTM model built with a single feature achieves 0.7% and 2.1% improvement on Log4j-1.2 and Xalan-2.7, respectively. When using combined features to train the prediction model, we call this model a transfer long short-term memory for defect prediction (DPTLSTM). DPTLSTM achieves a 2.9% and 5% improvement on Synapse-1.2 and Xerces-1.4.4, respectively. Both prove the superiority of the proposed model on the CPDP task. This is because LSTM capture long-term dependencies in sequence data and extract features that contain source code structure and context information. It can be concluded that: (1) the TLSTM model has the advantage of preserving information, which can better retain the semantic features related to software defects; (2) compared with the CPDP model trained with traditional metric features, the performance of the model can validly enhance by combining semantic features and metric features.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/5550801","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140233101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design and Efficacy of a Data Lake Architecture for Multimodal Emotion Feature Extraction in Social Media 社交媒体中多模态情感特征提取数据湖架构的设计与功效
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-03-08 DOI: 10.1049/2024/6819714
Yuanyuan Fan, Xifeng Mi

In the rapidly evolving landscape of social media, the demand for precise sentiment analysis (SA) on multimodal data has become increasingly pivotal. This paper introduces a sophisticated data lake architecture tailored for efficient multimodal emotion feature extraction, addressing the challenges posed by diverse data types. The proposed framework encompasses a robust storage solution and an innovative SA model, multilevel spatial attention fusion (MLSAF), adept at handling text and visual data concurrently. The data lake architecture comprises five layers, facilitating real-time and offline data collection, storage, processing, standardized interface services, and data mining analysis. The MLSAF model, integrated into the data lake architecture, utilizes a novel approach to SA. It employs a text-guided spatial attention mechanism, fusing textual and visual features to discern subtle emotional interplays. The model’s end-to-end learning approach and attention modules contribute to its efficacy in capturing nuanced sentiment expressions. Empirical evaluations on established multimodal sentiment datasets, MVSA-Single and MVSA-Multi, validate the proposed methodology’s effectiveness. Comparative analyses with state-of-the-art models showcase the superior performance of our approach, with an accuracy improvement of 6% on MVSA-Single and 1.6% on MVSA-Multi. This research significantly contributes to optimizing SA in social media data by offering a versatile and potent framework for data management and analysis. The integration of MLSAF with a scalable data lake architecture presents a strategic innovation poised to navigate the evolving complexities of social media data analytics.

在快速发展的社交媒体环境中,对多模态数据进行精确情感分析(SA)的需求变得越来越重要。本文介绍了一种为高效多模态情感特征提取量身定制的复杂数据湖架构,以应对不同数据类型带来的挑战。所提出的框架包括一个强大的存储解决方案和一个创新的 SA 模型--多级空间注意力融合(MLSAF),该模型善于同时处理文本和视觉数据。数据湖架构由五层组成,便于实时和离线数据收集、存储、处理、标准化接口服务和数据挖掘分析。集成到数据湖架构中的 MLSAF 模型采用了一种新颖的 SA 方法。它采用文本引导的空间注意力机制,融合文本和视觉特征来辨别微妙的情感交织。该模型的端到端学习方法和注意力模块有助于有效捕捉细微的情感表达。在已建立的多模态情感数据集 MVSA-Single 和 MVSA-Multi 上进行的实证评估验证了所提出方法的有效性。与最先进模型的对比分析表明,我们的方法性能优越,在 MVSA-Single 和 MVSA-Multi 数据集上的准确率分别提高了 6% 和 1.6%。这项研究为数据管理和分析提供了一个多功能的有效框架,为优化社交媒体数据中的 SA 做出了重大贡献。将 MLSAF 与可扩展的数据湖架构整合在一起,是一项战略性创新,有助于驾驭社交媒体数据分析不断变化的复杂性。
{"title":"Design and Efficacy of a Data Lake Architecture for Multimodal Emotion Feature Extraction in Social Media","authors":"Yuanyuan Fan,&nbsp;Xifeng Mi","doi":"10.1049/2024/6819714","DOIUrl":"https://doi.org/10.1049/2024/6819714","url":null,"abstract":"<div>\u0000 <p>In the rapidly evolving landscape of social media, the demand for precise sentiment analysis (SA) on multimodal data has become increasingly pivotal. This paper introduces a sophisticated data lake architecture tailored for efficient multimodal emotion feature extraction, addressing the challenges posed by diverse data types. The proposed framework encompasses a robust storage solution and an innovative SA model, multilevel spatial attention fusion (MLSAF), adept at handling text and visual data concurrently. The data lake architecture comprises five layers, facilitating real-time and offline data collection, storage, processing, standardized interface services, and data mining analysis. The MLSAF model, integrated into the data lake architecture, utilizes a novel approach to SA. It employs a text-guided spatial attention mechanism, fusing textual and visual features to discern subtle emotional interplays. The model’s end-to-end learning approach and attention modules contribute to its efficacy in capturing nuanced sentiment expressions. Empirical evaluations on established multimodal sentiment datasets, MVSA-Single and MVSA-Multi, validate the proposed methodology’s effectiveness. Comparative analyses with state-of-the-art models showcase the superior performance of our approach, with an accuracy improvement of 6% on MVSA-Single and 1.6% on MVSA-Multi. This research significantly contributes to optimizing SA in social media data by offering a versatile and potent framework for data management and analysis. The integration of MLSAF with a scalable data lake architecture presents a strategic innovation poised to navigate the evolving complexities of social media data analytics.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/6819714","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141096394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling the Dynamics of Extrinsic Motivations in Shaping Future Experts’ Contributions to Developer Q&A Communities 揭示外在动机在塑造未来专家为开发人员问答社区做出贡献中的作用
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-08 DOI: 10.1049/2024/8354862
Yi Yang, Xinjun Mao, Menghan Wu

Developer question and answering communities rely on experts to provide helpful answers. However, these communities face a shortage of experts. To cultivate more experts, the community needs to quantify and analyze the rules of the influence of extrinsic motivations on the ongoing contributions of those developers who can become experts in the future (potential experts). Currently, there is a lack of potential expert-centred research on community incentives. To address this gap, we propose a motivational impact model with self-determination theory-based hypotheses to explore the impact of five extrinsic motivations (badge, status, learning, reputation, and reciprocity) for potential experts. We develop a status-based timeline partitioning method to count information on the sustained contributions of potential experts from Stack Overflow data and propose a multifactor assessment model to examine the motivational impact model to determine the relationship between potential experts’ extrinsic motivations and sustained contributions. Our results show that (i) badge and reciprocity promote the continuous contributions of potential experts while reputation and status reduce their contributions; (ii) status significantly affects the impact of reciprocity on potential experts’ contributions; (iii) the difference in the influence of extrinsic motivations on potential experts and active developers lies in the influence of reputation, learning, and status and its moderating effect. Based on these findings, we recommend that community managers identify potential experts early and optimize reputation and status incentives to incubate more experts.

开发人员问答社区依靠专家提供有用的答案。然而,这些社区面临着专家短缺的问题。为了培养更多的专家,社区需要量化和分析外在动机对那些未来可能成为专家的开发者(潜在专家)持续贡献的影响规则。目前,在社区激励机制方面缺乏以潜在专家为中心的研究。为了填补这一空白,我们提出了一个基于自我决定理论假设的激励影响模型,以探索五种外在激励(徽章、地位、学习、声誉和互惠)对潜在专家的影响。我们开发了一种基于地位的时间轴划分方法,从 Stack Overflow 数据中统计潜在专家的持续贡献信息,并提出了一个多因素评估模型来检验动机影响模型,以确定潜在专家的外在动机与持续贡献之间的关系。我们的结果表明:(i) 徽章和互惠会促进潜在专家的持续贡献,而声誉和地位会降低其贡献;(ii) 地位会显著影响互惠对潜在专家贡献的影响;(iii) 外在动机对潜在专家和活跃开发者影响的差异在于声誉、学习和地位的影响及其调节作用。基于这些发现,我们建议社区管理者尽早发现潜在专家,并优化声誉和地位激励机制,以孵化更多专家。
{"title":"Unveiling the Dynamics of Extrinsic Motivations in Shaping Future Experts’ Contributions to Developer Q&A Communities","authors":"Yi Yang,&nbsp;Xinjun Mao,&nbsp;Menghan Wu","doi":"10.1049/2024/8354862","DOIUrl":"10.1049/2024/8354862","url":null,"abstract":"<div>\u0000 <p>Developer question and answering communities rely on experts to provide helpful answers. However, these communities face a shortage of experts. To cultivate more experts, the community needs to quantify and analyze the rules of the influence of extrinsic motivations on the ongoing contributions of those developers who can become experts in the future (potential experts). Currently, there is a lack of potential expert-centred research on community incentives. To address this gap, we propose a motivational impact model with self-determination theory-based hypotheses to explore the impact of five extrinsic motivations (badge, status, learning, reputation, and reciprocity) for potential experts. We develop a status-based timeline partitioning method to count information on the sustained contributions of potential experts from Stack Overflow data and propose a multifactor assessment model to examine the motivational impact model to determine the relationship between potential experts’ extrinsic motivations and sustained contributions. Our results show that (i) badge and reciprocity promote the continuous contributions of potential experts while reputation and status reduce their contributions; (ii) status significantly affects the impact of reciprocity on potential experts’ contributions; (iii) the difference in the influence of extrinsic motivations on potential experts and active developers lies in the influence of reputation, learning, and status and its moderating effect. Based on these findings, we recommend that community managers identify potential experts early and optimize reputation and status incentives to incubate more experts.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/8354862","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139853919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Meta-Model Architecture and Elimination Method for Uncertainty Modeling 用于不确定性建模的元模型架构和消除方法
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-01-12 DOI: 10.1049/2024/5591449
Haoran Shi, Shijun Liu, Li Pan

Uncertainty exists widely in various fields, especially in industrial manufacturing. From traditional manufacturing to intelligent manufacturing, uncertainty always exists in the manufacturing process. With the integration of rapidly developing intelligent technology, the complexity of manufacturing scenarios is increasing, and the postdecision method cannot fully meet the needs of the high reliability of the process. It is necessary to research the pre-elimination of uncertainty to ensure the reliability of process execution. Here, we analyze the sources and characteristics of uncertainty in manufacturing scenarios and propose a meta-model architecture and uncertainty quantification (UQ) framework for uncertainty modeling. On the one hand, our approach involves the creation of a meta-model structure that incorporates various strategies for uncertainty elimination (UE). On the other hand, we develop a comprehensive UQ framework that utilizes quantified metrics and outcomes to bolster the UE process. Finally, a deterministic model is constructed to guide and drive the process execution, which can achieve the purpose of controlling the uncertainty in advance and ensuring the reliability of the process. In addition, two typical manufacturing process scenarios are modeled, and quantitative experiments are conducted on a simulated production line and open-source data sets, respectively, to illustrate the idea and feasibility of the proposed approach. The proposed UE approach, which innovatively combines the domain modeling from the software engineering field and the probability-based UQ method, can be used as a general tool to guide the reliable execution of the process.

不确定性广泛存在于各个领域,尤其是工业制造领域。从传统制造到智能制造,制造过程中始终存在不确定性。随着快速发展的智能技术的融合,制造场景的复杂性不断增加,后决策方法已不能完全满足过程高可靠性的需求。有必要研究如何预先消除不确定性,以确保流程执行的可靠性。在此,我们分析了制造场景中不确定性的来源和特征,并提出了用于不确定性建模的元模型架构和不确定性量化(UQ)框架。一方面,我们的方法包括创建一个元模型结构,其中包含各种消除不确定性(UE)的策略。另一方面,我们开发了一个全面的不确定性消除框架,利用量化指标和结果来支持不确定性消除过程。最后,我们构建了一个确定性模型来指导和驱动流程执行,从而达到提前控制不确定性和确保流程可靠性的目的。此外,还模拟了两种典型的制造流程场景,并分别在模拟生产线和开源数据集上进行了定量实验,以说明所提方法的思路和可行性。所提出的 UE 方法创新性地结合了软件工程领域的领域建模和基于概率的 UQ 方法,可用作指导流程可靠执行的通用工具。
{"title":"A Meta-Model Architecture and Elimination Method for Uncertainty Modeling","authors":"Haoran Shi,&nbsp;Shijun Liu,&nbsp;Li Pan","doi":"10.1049/2024/5591449","DOIUrl":"10.1049/2024/5591449","url":null,"abstract":"<div>\u0000 <p>Uncertainty exists widely in various fields, especially in industrial manufacturing. From traditional manufacturing to intelligent manufacturing, uncertainty always exists in the manufacturing process. With the integration of rapidly developing intelligent technology, the complexity of manufacturing scenarios is increasing, and the postdecision method cannot fully meet the needs of the high reliability of the process. It is necessary to research the pre-elimination of uncertainty to ensure the reliability of process execution. Here, we analyze the sources and characteristics of uncertainty in manufacturing scenarios and propose a meta-model architecture and uncertainty quantification (UQ) framework for uncertainty modeling. On the one hand, our approach involves the creation of a meta-model structure that incorporates various strategies for uncertainty elimination (UE). On the other hand, we develop a comprehensive UQ framework that utilizes quantified metrics and outcomes to bolster the UE process. Finally, a deterministic model is constructed to guide and drive the process execution, which can achieve the purpose of controlling the uncertainty in advance and ensuring the reliability of the process. In addition, two typical manufacturing process scenarios are modeled, and quantitative experiments are conducted on a simulated production line and open-source data sets, respectively, to illustrate the idea and feasibility of the proposed approach. The proposed UE approach, which innovatively combines the domain modeling from the software engineering field and the probability-based UQ method, can be used as a general tool to guide the reliable execution of the process.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/5591449","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139624985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VdaBSC: A Novel Vulnerability Detection Approach for Blockchain Smart Contract by Dynamic Analysis VdaBSC:通过动态分析检测区块链智能合约漏洞的新方法
IF 1.6 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2023-12-29 DOI: 10.1049/2023/6631967
Rexford Nii Ayitey Sosu, Jinfu Chen, Edward Kwadwo Boahen, Zikang Zhang
Smart contracts have gained immense popularity in recent years as self-executing programs that operate on a blockchain. However, they are not immune to security flaws, which can result in significant financial losses. These flaws can be detected using dynamic analysis methods that extract various aspects from smart contract bytecode. Methods currently used for identifying vulnerabilities in smart contracts mostly rely on static analysis methods that search for predefined vulnerability patterns. However, these patterns often fail to capture complex vulnerabilities, leading to a high rate of false negatives. To overcome this limitation, researchers have explored machine learning-based methods. However, the accurate interpretation of complex logic and structural information in smart contract code remains a challenge. In this study, we present a technique that combines real-time runtime batch normalization and data augmentation for data preprocessing, along with n-grams and one-hot encoding for feature extraction of opcode sequence information from the bytecode. We then combined bidirectional long short-term memory (BiLSTM), convolutional neural network, and the attention mechanism for vulnerability detection and classification. Additionally, our model includes a gated recurrent units memory module that enhances efficiency using historical execution data from the contract. Our results demonstrate that our proposed model effectively identifies smart contract vulnerabilities.
智能合约作为在区块链上运行的自我执行程序,近年来大受欢迎。然而,它们也难免存在安全漏洞,可能导致重大经济损失。可以使用动态分析方法从智能合约字节码中提取各方面的信息来检测这些漏洞。目前用于识别智能合约漏洞的方法大多依赖于搜索预定义漏洞模式的静态分析方法。然而,这些模式往往无法捕捉到复杂的漏洞,导致误判率很高。为了克服这一局限,研究人员探索了基于机器学习的方法。然而,如何准确解读智能合约代码中复杂的逻辑和结构信息仍然是一个挑战。在本研究中,我们提出了一种技术,该技术结合了实时运行时批量规范化和数据增强技术进行数据预处理,并结合 n-grams 和单次编码技术从字节码中提取操作码序列信息的特征。然后,我们将双向长短期记忆(BiLSTM)、卷积神经网络和注意力机制结合起来,进行漏洞检测和分类。此外,我们的模型还包括一个门控递归单元记忆模块,可利用合约的历史执行数据提高效率。结果表明,我们提出的模型能有效识别智能合约漏洞。
{"title":"VdaBSC: A Novel Vulnerability Detection Approach for Blockchain Smart Contract by Dynamic Analysis","authors":"Rexford Nii Ayitey Sosu, Jinfu Chen, Edward Kwadwo Boahen, Zikang Zhang","doi":"10.1049/2023/6631967","DOIUrl":"https://doi.org/10.1049/2023/6631967","url":null,"abstract":"Smart contracts have gained immense popularity in recent years as self-executing programs that operate on a blockchain. However, they are not immune to security flaws, which can result in significant financial losses. These flaws can be detected using dynamic analysis methods that extract various aspects from smart contract bytecode. Methods currently used for identifying vulnerabilities in smart contracts mostly rely on static analysis methods that search for predefined vulnerability patterns. However, these patterns often fail to capture complex vulnerabilities, leading to a high rate of false negatives. To overcome this limitation, researchers have explored machine learning-based methods. However, the accurate interpretation of complex logic and structural information in smart contract code remains a challenge. In this study, we present a technique that combines real-time runtime batch normalization and data augmentation for data preprocessing, along with n-grams and one-hot encoding for feature extraction of opcode sequence information from the bytecode. We then combined bidirectional long short-term memory (BiLSTM), convolutional neural network, and the attention mechanism for vulnerability detection and classification. Additionally, our model includes a gated recurrent units memory module that enhances efficiency using historical execution data from the contract. Our results demonstrate that our proposed model effectively identifies smart contract vulnerabilities.","PeriodicalId":50378,"journal":{"name":"IET Software","volume":" 5","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139144230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IET Software
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1