Comparative Analysis for Predicting Non-Functional Requirements using Supervised Machine Learning

2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA) Pub Date : 2021-04-06 DOI:10.1109/CAIDA51941.2021.9425236

Vajeeha Mir Khatian, Qasim Ali Arain, Mamdouh Alenezi, Muhammad Owais Raza, Fariha Shaikh, Isma Farah

{"title":"Comparative Analysis for Predicting Non-Functional Requirements using Supervised Machine Learning","authors":"Vajeeha Mir Khatian, Qasim Ali Arain, Mamdouh Alenezi, Muhammad Owais Raza, Fariha Shaikh, Isma Farah","doi":"10.1109/CAIDA51941.2021.9425236","DOIUrl":null,"url":null,"abstract":"Functional and non-functional requirements are two important aspects of the requirements gathering phase (RGP) in any system development lifecycle (SDLC) model. The FRs are much simpler to understand and easily extractable from the user stories at RGP. On the other hand, the non-functional requirements (NFRs) are critical but play a significant role to improve the quality of the product and are used in determining the acceptance of a designed system. Inside the NFR, several quality factors focus on the specific quality attribute of a system, like security, performance, reliability, etc. To classify the NFRs for each category is a challenging task. This paper mainly focuses on the prediction of the requirements classification of NFRs by using supervised machine learning (ML) algorithms followed by comparative analysis on five different ML algorithms: decision tree, k-nearest neighbor (KNN), random forest classifier (RFC), naïve Bayes and logistic regression (LR). The study has been conducted in two phases. In the first phase, the model has been designed which accepts a dataset containing textual data where 11 quality attributes are focused for prediction, and evaluation is done based on 15% of test data and 85% of training data, while in the second phase, the performance of each algorithm is evaluated based on four different evaluation metrics: precision, recall, accuracy, and confusion matrix. The exhaustive results of the comparative analysis demonstrate that the performance of the LR algorithm is the best of all algorithms with very high prediction rates and 75% accuracy. Besides, the naïve Bayes resulted in 66% accuracy at second place, the decision tree provided 60% accuracy and marked third, the RFC with 53% accuracy being at fourth, and KNN with 50% accuracy being lowest of all. The LR algorithm should be preferred for the prediction of the classification of NFRs","PeriodicalId":272573,"journal":{"name":"2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CAIDA51941.2021.9425236","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Functional and non-functional requirements are two important aspects of the requirements gathering phase (RGP) in any system development lifecycle (SDLC) model. The FRs are much simpler to understand and easily extractable from the user stories at RGP. On the other hand, the non-functional requirements (NFRs) are critical but play a significant role to improve the quality of the product and are used in determining the acceptance of a designed system. Inside the NFR, several quality factors focus on the specific quality attribute of a system, like security, performance, reliability, etc. To classify the NFRs for each category is a challenging task. This paper mainly focuses on the prediction of the requirements classification of NFRs by using supervised machine learning (ML) algorithms followed by comparative analysis on five different ML algorithms: decision tree, k-nearest neighbor (KNN), random forest classifier (RFC), naïve Bayes and logistic regression (LR). The study has been conducted in two phases. In the first phase, the model has been designed which accepts a dataset containing textual data where 11 quality attributes are focused for prediction, and evaluation is done based on 15% of test data and 85% of training data, while in the second phase, the performance of each algorithm is evaluated based on four different evaluation metrics: precision, recall, accuracy, and confusion matrix. The exhaustive results of the comparative analysis demonstrate that the performance of the LR algorithm is the best of all algorithms with very high prediction rates and 75% accuracy. Besides, the naïve Bayes resulted in 66% accuracy at second place, the decision tree provided 60% accuracy and marked third, the RFC with 53% accuracy being at fourth, and KNN with 50% accuracy being lowest of all. The LR algorithm should be preferred for the prediction of the classification of NFRs

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用监督机器学习预测非功能需求的比较分析

在任何系统开发生命周期(SDLC)模型中，功能需求和非功能需求是需求收集阶段(RGP)的两个重要方面。fr更容易理解，也更容易从RGP的用户描述中提取出来。另一方面，非功能需求(NFRs)是至关重要的，但对提高产品质量起着重要作用，并用于确定设计系统的验收。在NFR中，有几个质量因素关注系统的特定质量属性，如安全性、性能、可靠性等。对每个类别的nfr进行分类是一项具有挑战性的任务。本文主要研究了使用监督机器学习(ML)算法对nfr需求分类的预测，并对决策树、k近邻(KNN)、随机森林分类器(RFC)、naïve贝叶斯和逻辑回归(LR)五种不同的ML算法进行了比较分析。这项研究分两个阶段进行。在第一阶段，模型设计接受包含文本数据的数据集，其中集中了11个质量属性进行预测，并基于15%的测试数据和85%的训练数据进行评估，而在第二阶段，基于四个不同的评估指标对每个算法的性能进行评估:精度，召回率，准确度和混淆矩阵。详尽的对比分析结果表明，LR算法是所有算法中性能最好的，具有很高的预测率和75%的准确率。此外，naïve贝叶斯的准确率为66%，排名第二，决策树的准确率为60%，排名第三，准确率为53%的RFC排名第四，准确率为50%的KNN排名最低。对于NFRs分类的预测，LR算法应该是首选的

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA)

自引率

0.00%

发文量