Heterogeneous Graph Neural Networks for Software Effort Estimation

Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement Pub Date : 2022-06-22 DOI:10.1145/3544902.3546248

H. Phan, A. Jannesari

{"title":"Heterogeneous Graph Neural Networks for Software Effort Estimation","authors":"H. Phan, A. Jannesari","doi":"10.1145/3544902.3546248","DOIUrl":null,"url":null,"abstract":"Background. Software effort can be measured by story point [35]. Story point estimation is important in software projects’ planning. Current approaches for automatically estimating story points focus on applying pre-trained embedding models and deep learning for text regression to solve this problem. These approaches require expensive embedding models and confront challenges that the sequence of text might not be an efficient representation for software issues which can be the combination of text and code. Aims. We propose HeteroSP, a tool for estimating story points from textual input of Agile software project issues. We select GPT2SP [12] and Deep-SE [8] as the baselines for comparison. Method. First, from the analysis of the story point dataset [8], we conclude that software issues are actually a mixture of natural language sentences with quoted code snippets and have problems related to large-size vocabulary. Second, we provide a module to normalize the input text including words and code tokens of the software issues. Third, we design an algorithm to convert an input software issue to a graph with different types of nodes and edges. Fourth, we construct a heterogeneous graph neural networks model with the support of fastText [6] for constructing initial node embedding to learn and predict the story points of new issues. Results. We did the comparison over three scenarios of estimation, including within project, cross-project within the repository, and cross-project cross repository with our baseline approaches. We achieve the average Mean Absolute Error (MAE) as 2.38, 2.61, and 2.63 for three scenarios. We outperform GPT2SP in 2/3 of the scenarios while outperforming Deep-SE in the most challenging scenario with significantly less amount of running time. We also compare our approaches with different homogeneous graph neural network models and the results show that the heterogeneous graph neural networks model outperforms the homogeneous models in story point estimation. For time performance, we achieve about 570 seconds as the time performance in both three processes: node embedding initialization, model construction, and story point estimation. HeterpSP’s artifacts are available at [22]. Conclusion. HeteroSP, a heterogeneous graph neural networks model for story point estimation, achieved good accuracy and running time.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3544902.3546248","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Background. Software effort can be measured by story point [35]. Story point estimation is important in software projects’ planning. Current approaches for automatically estimating story points focus on applying pre-trained embedding models and deep learning for text regression to solve this problem. These approaches require expensive embedding models and confront challenges that the sequence of text might not be an efficient representation for software issues which can be the combination of text and code. Aims. We propose HeteroSP, a tool for estimating story points from textual input of Agile software project issues. We select GPT2SP [12] and Deep-SE [8] as the baselines for comparison. Method. First, from the analysis of the story point dataset [8], we conclude that software issues are actually a mixture of natural language sentences with quoted code snippets and have problems related to large-size vocabulary. Second, we provide a module to normalize the input text including words and code tokens of the software issues. Third, we design an algorithm to convert an input software issue to a graph with different types of nodes and edges. Fourth, we construct a heterogeneous graph neural networks model with the support of fastText [6] for constructing initial node embedding to learn and predict the story points of new issues. Results. We did the comparison over three scenarios of estimation, including within project, cross-project within the repository, and cross-project cross repository with our baseline approaches. We achieve the average Mean Absolute Error (MAE) as 2.38, 2.61, and 2.63 for three scenarios. We outperform GPT2SP in 2/3 of the scenarios while outperforming Deep-SE in the most challenging scenario with significantly less amount of running time. We also compare our approaches with different homogeneous graph neural network models and the results show that the heterogeneous graph neural networks model outperforms the homogeneous models in story point estimation. For time performance, we achieve about 570 seconds as the time performance in both three processes: node embedding initialization, model construction, and story point estimation. HeterpSP’s artifacts are available at [22]. Conclusion. HeteroSP, a heterogeneous graph neural networks model for story point estimation, achieved good accuracy and running time.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

异构图神经网络用于软件工作量估算

背景。软件工作可以通过故事点[35]来度量。故事点评估在软件项目的规划中是很重要的。当前自动估计故事点的方法主要集中在应用预训练的嵌入模型和文本回归的深度学习来解决这个问题。这些方法需要昂贵的嵌入模型，并面临着文本序列可能不是软件问题(可能是文本和代码的组合)的有效表示的挑战。目标我们提出了HeteroSP，一个从敏捷软件项目问题的文本输入中估计故事点的工具。我们选择GPT2SP[12]和Deep-SE[8]作为基线进行比较。方法。首先，从对故事点数据集[8]的分析中，我们得出结论，软件问题实际上是自然语言句子与引用代码片段的混合，并且存在与大词汇量相关的问题。其次，我们提供了一个模块来规范输入文本，包括单词和代码标记的软件问题。第三，我们设计了一种算法，将输入软件问题转换为具有不同类型节点和边的图。第四，在fastText[6]的支持下，构建异构图神经网络模型，构建初始节点嵌入，学习和预测新问题的故事点。结果。我们对三种评估场景进行了比较，包括项目内、存储库内的跨项目，以及使用我们的基线方法的跨项目跨存储库。在三种情况下，平均绝对误差(MAE)分别为2.38、2.61和2.63。我们在2/3的场景中优于GPT2SP，而在最具挑战性的场景中优于Deep-SE，且运行时间显著减少。我们还将我们的方法与不同的同构图神经网络模型进行了比较，结果表明异构图神经网络模型在故事点估计方面优于同构模型。在时间性能方面，我们在节点嵌入初始化、模型构建和故事点估计这三个过程中都实现了大约570秒的时间性能。从b[22]可以获得HeterpSP的构件。结论。异构图神经网络模型HeteroSP在故事点估计方面取得了良好的准确性和运行时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

自引率

0.00%

发文量