Enhancing requirements-to-code traceability with GA-XWCoDe: Integrating XGBoost, Node2Vec, and genetic algorithms for improving model performance and stability
Zhiyuan Zou , Bangchao Wang , Xinrong Hu , Yang Deng , Hongyan Wan , Huan Jin
{"title":"Enhancing requirements-to-code traceability with GA-XWCoDe: Integrating XGBoost, Node2Vec, and genetic algorithms for improving model performance and stability","authors":"Zhiyuan Zou , Bangchao Wang , Xinrong Hu , Yang Deng , Hongyan Wan , Huan Jin","doi":"10.1016/j.jksuci.2024.102197","DOIUrl":null,"url":null,"abstract":"<div><div>This study addresses the challenge of requirements-to-code traceability by proposing a novel model, Genetic Algorithm-XGBoost With Code Dependency (GA-XWCoDe), which integrates eXtreme Gradient Boosting (XGBoost) with a Node2Vec model-weighted code dependency strategy and genetic algorithms for parameter optimisation. XGBoost mitigates overfitting and enhances model stability, while Node2Vec improves prediction accuracy for low-confidence links. Genetic algorithms are employed to optimise model parameters efficiently, reducing the resource intensity of traditional methods. Experimental results show that GA-XWCoDe outperforms the state-of-the-art method TRAceability lInk cLassifier (TRAIL) by 17.44% and Deep Forest for Requirement traceability (DF4RT) by 33.36% in terms of average F1 performance across four datasets. It is significantly superior to all baseline methods at a confidence level of <span><math><mi>α</mi></math></span>¡0.01 and demonstrates exceptional performance and stability across various training data scales.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 8","pages":"Article 102197"},"PeriodicalIF":5.2000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of King Saud University-Computer and Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1319157824002866","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
This study addresses the challenge of requirements-to-code traceability by proposing a novel model, Genetic Algorithm-XGBoost With Code Dependency (GA-XWCoDe), which integrates eXtreme Gradient Boosting (XGBoost) with a Node2Vec model-weighted code dependency strategy and genetic algorithms for parameter optimisation. XGBoost mitigates overfitting and enhances model stability, while Node2Vec improves prediction accuracy for low-confidence links. Genetic algorithms are employed to optimise model parameters efficiently, reducing the resource intensity of traditional methods. Experimental results show that GA-XWCoDe outperforms the state-of-the-art method TRAceability lInk cLassifier (TRAIL) by 17.44% and Deep Forest for Requirement traceability (DF4RT) by 33.36% in terms of average F1 performance across four datasets. It is significantly superior to all baseline methods at a confidence level of ¡0.01 and demonstrates exceptional performance and stability across various training data scales.
期刊介绍:
In 2022 the Journal of King Saud University - Computer and Information Sciences will become an author paid open access journal. Authors who submit their manuscript after October 31st 2021 will be asked to pay an Article Processing Charge (APC) after acceptance of their paper to make their work immediately, permanently, and freely accessible to all. The Journal of King Saud University Computer and Information Sciences is a refereed, international journal that covers all aspects of both foundations of computer and its practical applications.