Heterogeneous Graph Neural Networks for Software Effort Estimation

H. Phan, A. Jannesari
{"title":"Heterogeneous Graph Neural Networks for Software Effort Estimation","authors":"H. Phan, A. Jannesari","doi":"10.1145/3544902.3546248","DOIUrl":null,"url":null,"abstract":"Background. Software effort can be measured by story point [35]. Story point estimation is important in software projects’ planning. Current approaches for automatically estimating story points focus on applying pre-trained embedding models and deep learning for text regression to solve this problem. These approaches require expensive embedding models and confront challenges that the sequence of text might not be an efficient representation for software issues which can be the combination of text and code. Aims. We propose HeteroSP, a tool for estimating story points from textual input of Agile software project issues. We select GPT2SP [12] and Deep-SE [8] as the baselines for comparison. Method. First, from the analysis of the story point dataset [8], we conclude that software issues are actually a mixture of natural language sentences with quoted code snippets and have problems related to large-size vocabulary. Second, we provide a module to normalize the input text including words and code tokens of the software issues. Third, we design an algorithm to convert an input software issue to a graph with different types of nodes and edges. Fourth, we construct a heterogeneous graph neural networks model with the support of fastText [6] for constructing initial node embedding to learn and predict the story points of new issues. Results. We did the comparison over three scenarios of estimation, including within project, cross-project within the repository, and cross-project cross repository with our baseline approaches. We achieve the average Mean Absolute Error (MAE) as 2.38, 2.61, and 2.63 for three scenarios. We outperform GPT2SP in 2/3 of the scenarios while outperforming Deep-SE in the most challenging scenario with significantly less amount of running time. We also compare our approaches with different homogeneous graph neural network models and the results show that the heterogeneous graph neural networks model outperforms the homogeneous models in story point estimation. For time performance, we achieve about 570 seconds as the time performance in both three processes: node embedding initialization, model construction, and story point estimation. HeterpSP’s artifacts are available at [22]. Conclusion. HeteroSP, a heterogeneous graph neural networks model for story point estimation, achieved good accuracy and running time.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3544902.3546248","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Background. Software effort can be measured by story point [35]. Story point estimation is important in software projects’ planning. Current approaches for automatically estimating story points focus on applying pre-trained embedding models and deep learning for text regression to solve this problem. These approaches require expensive embedding models and confront challenges that the sequence of text might not be an efficient representation for software issues which can be the combination of text and code. Aims. We propose HeteroSP, a tool for estimating story points from textual input of Agile software project issues. We select GPT2SP [12] and Deep-SE [8] as the baselines for comparison. Method. First, from the analysis of the story point dataset [8], we conclude that software issues are actually a mixture of natural language sentences with quoted code snippets and have problems related to large-size vocabulary. Second, we provide a module to normalize the input text including words and code tokens of the software issues. Third, we design an algorithm to convert an input software issue to a graph with different types of nodes and edges. Fourth, we construct a heterogeneous graph neural networks model with the support of fastText [6] for constructing initial node embedding to learn and predict the story points of new issues. Results. We did the comparison over three scenarios of estimation, including within project, cross-project within the repository, and cross-project cross repository with our baseline approaches. We achieve the average Mean Absolute Error (MAE) as 2.38, 2.61, and 2.63 for three scenarios. We outperform GPT2SP in 2/3 of the scenarios while outperforming Deep-SE in the most challenging scenario with significantly less amount of running time. We also compare our approaches with different homogeneous graph neural network models and the results show that the heterogeneous graph neural networks model outperforms the homogeneous models in story point estimation. For time performance, we achieve about 570 seconds as the time performance in both three processes: node embedding initialization, model construction, and story point estimation. HeterpSP’s artifacts are available at [22]. Conclusion. HeteroSP, a heterogeneous graph neural networks model for story point estimation, achieved good accuracy and running time.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
异构图神经网络用于软件工作量估算
背景。软件工作可以通过故事点[35]来度量。故事点评估在软件项目的规划中是很重要的。当前自动估计故事点的方法主要集中在应用预训练的嵌入模型和文本回归的深度学习来解决这个问题。这些方法需要昂贵的嵌入模型,并面临着文本序列可能不是软件问题(可能是文本和代码的组合)的有效表示的挑战。目标我们提出了HeteroSP,一个从敏捷软件项目问题的文本输入中估计故事点的工具。我们选择GPT2SP[12]和Deep-SE[8]作为基线进行比较。方法。首先,从对故事点数据集[8]的分析中,我们得出结论,软件问题实际上是自然语言句子与引用代码片段的混合,并且存在与大词汇量相关的问题。其次,我们提供了一个模块来规范输入文本,包括单词和代码标记的软件问题。第三,我们设计了一种算法,将输入软件问题转换为具有不同类型节点和边的图。第四,在fastText[6]的支持下,构建异构图神经网络模型,构建初始节点嵌入,学习和预测新问题的故事点。结果。我们对三种评估场景进行了比较,包括项目内、存储库内的跨项目,以及使用我们的基线方法的跨项目跨存储库。在三种情况下,平均绝对误差(MAE)分别为2.38、2.61和2.63。我们在2/3的场景中优于GPT2SP,而在最具挑战性的场景中优于Deep-SE,且运行时间显著减少。我们还将我们的方法与不同的同构图神经网络模型进行了比较,结果表明异构图神经网络模型在故事点估计方面优于同构模型。在时间性能方面,我们在节点嵌入初始化、模型构建和故事点估计这三个过程中都实现了大约570秒的时间性能。从b[22]可以获得HeterpSP的构件。结论。异构图神经网络模型HeteroSP在故事点估计方面取得了良好的准确性和运行时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analyzing the Relationship between Community and Design Smells in Open-Source Software Projects: An Empirical Study A Preliminary Investigation of MLOps Practices in GitHub PG-VulNet: Detect Supply Chain Vulnerabilities in IoT Devices using Pseudo-code and Graphs On the Relationship Between Story Points and Development Effort in Agile Open-Source Software DevOps Practitioners’ Perceptions of the Low-code Trend
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1