Shoaib Hassan, Qianmu Li, Khursheed Aurangzeb, Affan Yasin, Javed Ali Khan, Muhammad Shahid Anwar
{"title":"A systematic mapping to investigate the application of machine learning techniques in requirement engineering activities","authors":"Shoaib Hassan, Qianmu Li, Khursheed Aurangzeb, Affan Yasin, Javed Ali Khan, Muhammad Shahid Anwar","doi":"10.1049/cit2.12348","DOIUrl":null,"url":null,"abstract":"<p>Over the past few years, the application and usage of Machine Learning (ML) techniques have increased exponentially due to continuously increasing the size of data and computing capacity. Despite the popularity of ML techniques, only a few research studies have focused on the application of ML especially supervised learning techniques in Requirement Engineering (RE) activities to solve the problems that occur in RE activities. The authors focus on the systematic mapping of past work to investigate those studies that focused on the application of supervised learning techniques in RE activities between the period of 2002–2023. The authors aim to investigate the research trends, main RE activities, ML algorithms, and data sources that were studied during this period. Forty-five research studies were selected based on our exclusion and inclusion criteria. The results show that the scientific community used 57 algorithms. Among those algorithms, researchers mostly used the five following ML algorithms in RE activities: Decision Tree, Support Vector Machine, Naïve Bayes, K-nearest neighbour Classifier, and Random Forest. The results show that researchers used these algorithms in eight major RE activities. Those activities are requirements analysis, failure prediction, effort estimation, quality, traceability, business rules identification, content classification, and detection of problems in requirements written in natural language. Our selected research studies used 32 private and 41 public data sources. The most popular data sources that were detected in selected studies are the Metric Data Programme from NASA, Predictor Models in Software Engineering, and iTrust Electronic Health Care System.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 6","pages":"1412-1434"},"PeriodicalIF":8.4000,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12348","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cit2.12348","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Over the past few years, the application and usage of Machine Learning (ML) techniques have increased exponentially due to continuously increasing the size of data and computing capacity. Despite the popularity of ML techniques, only a few research studies have focused on the application of ML especially supervised learning techniques in Requirement Engineering (RE) activities to solve the problems that occur in RE activities. The authors focus on the systematic mapping of past work to investigate those studies that focused on the application of supervised learning techniques in RE activities between the period of 2002–2023. The authors aim to investigate the research trends, main RE activities, ML algorithms, and data sources that were studied during this period. Forty-five research studies were selected based on our exclusion and inclusion criteria. The results show that the scientific community used 57 algorithms. Among those algorithms, researchers mostly used the five following ML algorithms in RE activities: Decision Tree, Support Vector Machine, Naïve Bayes, K-nearest neighbour Classifier, and Random Forest. The results show that researchers used these algorithms in eight major RE activities. Those activities are requirements analysis, failure prediction, effort estimation, quality, traceability, business rules identification, content classification, and detection of problems in requirements written in natural language. Our selected research studies used 32 private and 41 public data sources. The most popular data sources that were detected in selected studies are the Metric Data Programme from NASA, Predictor Models in Software Engineering, and iTrust Electronic Health Care System.
在过去几年里,由于数据规模和计算能力的持续增长,机器学习(ML)技术的应用和使用呈指数级增长。尽管 ML 技术很受欢迎,但只有少数研究关注 ML 的应用,特别是在需求工程(RE)活动中的监督学习技术,以解决 RE 活动中出现的问题。作者重点对过去的工作进行了系统梳理,调查了 2002-2023 年间那些关注监督学习技术在需求工程活动中应用的研究。作者旨在调查这一时期的研究趋势、主要可再生能源活动、ML 算法和数据来源。根据我们的排除和纳入标准,选出了 45 项研究。结果显示,科学界使用了 57 种算法。在这些算法中,研究人员在 RE 活动中主要使用了以下五种 ML 算法:决策树、支持向量机、奈夫贝叶斯、K-近邻分类器和随机森林。结果显示,研究人员在八项主要的 RE 活动中使用了这些算法。这些活动包括需求分析、故障预测、工作量估算、质量、可追溯性、业务规则识别、内容分类以及检测以自然语言编写的需求中存在的问题。我们选择的研究使用了 32 个私有数据源和 41 个公共数据源。在所选研究中发现的最受欢迎的数据源是美国国家航空航天局的度量数据计划、软件工程中的预测模型和 iTrust 电子医疗保健系统。
期刊介绍:
CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.