Sprint2Vec: A Deep Characterization of Sprints in Iterative Software Development

IF 5.6 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING IEEE Transactions on Software Engineering Pub Date : 2024-11-29 DOI:10.1109/TSE.2024.3509016

Morakot Choetkiertikul;Peerachai Banyongrakkul;Chaiyong Ragkhitwetsagul;Suppawong Tuarob;Hoa Khanh Dam;Thanwadee Sunetnanta

{"title":"Sprint2Vec: A Deep Characterization of Sprints in Iterative Software Development","authors":"Morakot Choetkiertikul;Peerachai Banyongrakkul;Chaiyong Ragkhitwetsagul;Suppawong Tuarob;Hoa Khanh Dam;Thanwadee Sunetnanta","doi":"10.1109/TSE.2024.3509016","DOIUrl":null,"url":null,"abstract":"Iterative approaches like Agile Scrum are commonly adopted to enhance the software development process. However, challenges such as schedule and budget overruns still persist in many software projects. Several approaches employ machine learning techniques, particularly classification, to facilitate decision-making in iterative software development. Existing approaches often concentrate on characterizing a sprint to predict solely productivity. We introduce Sprint2Vec, which leverages three aspects of sprint information – sprint attributes, issue attributes, and the developers involved in a sprint, to comprehensively characterize it for predicting both productivity and quality outcomes of the sprints. Our approach combines traditional feature extraction techniques with automated deep learning-based unsupervised feature learning techniques. We utilize methods like Long Short-Term Memory (LSTM) to enhance our feature learning process. This enables us to learn features from unstructured data, such as textual descriptions of issues and sequences of developer activities. We conducted an evaluation of our approach on two regression tasks: predicting the deliverability (i.e., the amount of work delivered from a sprint) and quality of a sprint (i.e., the amount of delivered work that requires rework). The evaluation results on five well-known open-source projects (Apache, Atlassian, Jenkins, Spring, and Talendforge) demonstrate our approach's superior performance compared to baseline and alternative approaches.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"220-242"},"PeriodicalIF":5.6000,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10771809","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10771809/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Iterative approaches like Agile Scrum are commonly adopted to enhance the software development process. However, challenges such as schedule and budget overruns still persist in many software projects. Several approaches employ machine learning techniques, particularly classification, to facilitate decision-making in iterative software development. Existing approaches often concentrate on characterizing a sprint to predict solely productivity. We introduce Sprint2Vec, which leverages three aspects of sprint information – sprint attributes, issue attributes, and the developers involved in a sprint, to comprehensively characterize it for predicting both productivity and quality outcomes of the sprints. Our approach combines traditional feature extraction techniques with automated deep learning-based unsupervised feature learning techniques. We utilize methods like Long Short-Term Memory (LSTM) to enhance our feature learning process. This enables us to learn features from unstructured data, such as textual descriptions of issues and sequences of developer activities. We conducted an evaluation of our approach on two regression tasks: predicting the deliverability (i.e., the amount of work delivered from a sprint) and quality of a sprint (i.e., the amount of delivered work that requires rework). The evaluation results on five well-known open-source projects (Apache, Atlassian, Jenkins, Spring, and Talendforge) demonstrate our approach's superior performance compared to baseline and alternative approaches.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Sprint2Vec：迭代软件开发中对sprint的深入描述

像敏捷Scrum这样的迭代方法通常被用来增强软件开发过程。然而，进度和预算超支等挑战仍然存在于许多软件项目中。有几种方法使用机器学习技术，特别是分类，来促进迭代软件开发中的决策。现有的方法通常集中于描述冲刺，以预测生产率。我们介绍了Sprint2Vec，它利用了sprint信息的三个方面——sprint属性、问题属性和sprint中涉及的开发人员，来全面地描述它，以预测sprint的生产力和质量结果。我们的方法结合了传统的特征提取技术和基于自动深度学习的无监督特征学习技术。我们利用长短期记忆（LSTM）等方法来增强我们的特征学习过程。这使我们能够从非结构化数据中学习特性，例如问题的文本描述和开发人员活动的序列。我们在两个回归任务上对我们的方法进行了评估：预测可交付性（即，从冲刺中交付的工作量）和冲刺的质量（即，需要返工的交付工作量）。在五个知名的开源项目（Apache、Atlassian、Jenkins、Spring和Talendforge）上的评估结果表明，与基线和替代方法相比，我们的方法具有优越的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.