Vajeeha Mir Khatian, Qasim Ali Arain, Mamdouh Alenezi, Muhammad Owais Raza, Fariha Shaikh, Isma Farah
{"title":"Comparative Analysis for Predicting Non-Functional Requirements using Supervised Machine Learning","authors":"Vajeeha Mir Khatian, Qasim Ali Arain, Mamdouh Alenezi, Muhammad Owais Raza, Fariha Shaikh, Isma Farah","doi":"10.1109/CAIDA51941.2021.9425236","DOIUrl":null,"url":null,"abstract":"Functional and non-functional requirements are two important aspects of the requirements gathering phase (RGP) in any system development lifecycle (SDLC) model. The FRs are much simpler to understand and easily extractable from the user stories at RGP. On the other hand, the non-functional requirements (NFRs) are critical but play a significant role to improve the quality of the product and are used in determining the acceptance of a designed system. Inside the NFR, several quality factors focus on the specific quality attribute of a system, like security, performance, reliability, etc. To classify the NFRs for each category is a challenging task. This paper mainly focuses on the prediction of the requirements classification of NFRs by using supervised machine learning (ML) algorithms followed by comparative analysis on five different ML algorithms: decision tree, k-nearest neighbor (KNN), random forest classifier (RFC), naïve Bayes and logistic regression (LR). The study has been conducted in two phases. In the first phase, the model has been designed which accepts a dataset containing textual data where 11 quality attributes are focused for prediction, and evaluation is done based on 15% of test data and 85% of training data, while in the second phase, the performance of each algorithm is evaluated based on four different evaluation metrics: precision, recall, accuracy, and confusion matrix. The exhaustive results of the comparative analysis demonstrate that the performance of the LR algorithm is the best of all algorithms with very high prediction rates and 75% accuracy. Besides, the naïve Bayes resulted in 66% accuracy at second place, the decision tree provided 60% accuracy and marked third, the RFC with 53% accuracy being at fourth, and KNN with 50% accuracy being lowest of all. The LR algorithm should be preferred for the prediction of the classification of NFRs","PeriodicalId":272573,"journal":{"name":"2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CAIDA51941.2021.9425236","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Functional and non-functional requirements are two important aspects of the requirements gathering phase (RGP) in any system development lifecycle (SDLC) model. The FRs are much simpler to understand and easily extractable from the user stories at RGP. On the other hand, the non-functional requirements (NFRs) are critical but play a significant role to improve the quality of the product and are used in determining the acceptance of a designed system. Inside the NFR, several quality factors focus on the specific quality attribute of a system, like security, performance, reliability, etc. To classify the NFRs for each category is a challenging task. This paper mainly focuses on the prediction of the requirements classification of NFRs by using supervised machine learning (ML) algorithms followed by comparative analysis on five different ML algorithms: decision tree, k-nearest neighbor (KNN), random forest classifier (RFC), naïve Bayes and logistic regression (LR). The study has been conducted in two phases. In the first phase, the model has been designed which accepts a dataset containing textual data where 11 quality attributes are focused for prediction, and evaluation is done based on 15% of test data and 85% of training data, while in the second phase, the performance of each algorithm is evaluated based on four different evaluation metrics: precision, recall, accuracy, and confusion matrix. The exhaustive results of the comparative analysis demonstrate that the performance of the LR algorithm is the best of all algorithms with very high prediction rates and 75% accuracy. Besides, the naïve Bayes resulted in 66% accuracy at second place, the decision tree provided 60% accuracy and marked third, the RFC with 53% accuracy being at fourth, and KNN with 50% accuracy being lowest of all. The LR algorithm should be preferred for the prediction of the classification of NFRs