Defect Prediction via Tree-Based Encoding with Hybrid Granularity for Software Sustainability

IF 3 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Sustainable Computing Pub Date : 2023-02-24 DOI:10.1109/TSUSC.2023.3248965
Shaojian Qiu;Huihao Huang;Wenchao Jiang;Fanlong Zhang;Weilin Zhou
{"title":"Defect Prediction via Tree-Based Encoding with Hybrid Granularity for Software Sustainability","authors":"Shaojian Qiu;Huihao Huang;Wenchao Jiang;Fanlong Zhang;Weilin Zhou","doi":"10.1109/TSUSC.2023.3248965","DOIUrl":null,"url":null,"abstract":"Defects in software may result in system crashes, sluggish performance, or even deadlock, leading to the depletion of valuable resources. Implementing defect prediction can assist quality assurance teams in identifying potential software issues and rationalizing the allocation of testing resources, thereby decreasing the elimination of resources and enhancing software sustainability. Researchers have recently incorporated deep learning into defect prediction, extracting structural-semantic features from codes’ abstract syntax trees (ASTs). However, inappropriate node granularity in ASTs may adversely impact the effectiveness of the extracted features. In addition, converting AST nodes into integer vectors may lead to the loss of structure information, resulting in poor model predictive capability. This paper proposes a tree-based encoding method with hybrid granularity for defect prediction to address these challenges. Specifically, five granularity selection schemes are extended to generate various ASTs from codes. Subsequently, a tree-based continuous bag-of-words model is utilized to map nodes of ASTs into numeric vector representations that conform to the tree-like structure of codes. The matrices converted from ASTs are then fed into a convolutional neural network to extract program features automatically. Experiments involving 24 versions of open-source projects demonstrate that our method can improve the effectiveness of extracted features in defect prediction tasks.","PeriodicalId":13268,"journal":{"name":"IEEE Transactions on Sustainable Computing","volume":"9 3","pages":"249-260"},"PeriodicalIF":3.0000,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Sustainable Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10052729/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Defects in software may result in system crashes, sluggish performance, or even deadlock, leading to the depletion of valuable resources. Implementing defect prediction can assist quality assurance teams in identifying potential software issues and rationalizing the allocation of testing resources, thereby decreasing the elimination of resources and enhancing software sustainability. Researchers have recently incorporated deep learning into defect prediction, extracting structural-semantic features from codes’ abstract syntax trees (ASTs). However, inappropriate node granularity in ASTs may adversely impact the effectiveness of the extracted features. In addition, converting AST nodes into integer vectors may lead to the loss of structure information, resulting in poor model predictive capability. This paper proposes a tree-based encoding method with hybrid granularity for defect prediction to address these challenges. Specifically, five granularity selection schemes are extended to generate various ASTs from codes. Subsequently, a tree-based continuous bag-of-words model is utilized to map nodes of ASTs into numeric vector representations that conform to the tree-like structure of codes. The matrices converted from ASTs are then fed into a convolutional neural network to extract program features automatically. Experiments involving 24 versions of open-source projects demonstrate that our method can improve the effectiveness of extracted features in defect prediction tasks.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过混合粒度的树状编码进行缺陷预测,实现软件可持续性
软件缺陷可能会导致系统崩溃、性能迟缓甚至死锁,从而耗费宝贵的资源。实施缺陷预测可以帮助质量保证团队识别潜在的软件问题,合理分配测试资源,从而减少资源损耗,提高软件的可持续性。最近,研究人员将深度学习融入缺陷预测,从代码的抽象语法树(AST)中提取结构语义特征。然而,AST 中不适当的节点粒度可能会对所提取特征的有效性产生不利影响。此外,将 AST 节点转换为整数向量可能会导致结构信息的丢失,从而导致模型预测能力低下。本文提出了一种基于树的混合粒度编码方法,用于缺陷预测,以应对这些挑战。具体来说,本文扩展了五种粒度选择方案,以便从编码中生成各种 AST。然后,利用基于树的连续词袋模型,将 AST 的节点映射为符合代码树状结构的数字向量表示。然后将从 AST 转换而来的矩阵输入卷积神经网络,以自动提取程序特征。涉及 24 个开源项目版本的实验证明,我们的方法可以提高缺陷预测任务中提取特征的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Sustainable Computing
IEEE Transactions on Sustainable Computing Mathematics-Control and Optimization
CiteScore
7.70
自引率
2.60%
发文量
54
期刊最新文献
Editorial Dynamic Event-Triggered State Estimation for Power Harmonics With Quantization Effects: A Zonotopic Set-Membership Approach 2024 Reviewers List Deadline-Aware Cost and Energy Efficient Offloading in Mobile Edge Computing Impacts of Increasing Temperature and Relative Humidity in Air-Cooled Tropical Data Centers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1