{"title":"Preliminary Experiments on the Performance of Machine Learning Models","authors":"Misheck Banda, E. Ngassam, Ernest Mnkandla","doi":"10.23919/IST-Africa56635.2022.9845534","DOIUrl":null,"url":null,"abstract":"Artificial intelligence and its related machine learning technologies constantly change how organisations manage their business data in a dynamic environment of ubiquitous data sources and formats. Most organisations face the challenge of selecting the appropriate machine learning models to extract insights from their existing business data, of which datasets may be unstructured, of different forms, types, and sizes. Logistic regression, random forest, and decision tree were the three machine learning models selected for this paper’s preliminary experiments to predict the likelihood of passengers surviving the Titanic disaster. Our investigation revealed that specific models are required to handle specific dataset types, in this case, categorical datasets. It was noted from the findings that a logistic regression model could be highly recommended for use on a categorical dataset based on the speed and high prediction performance obtained in the classification error metrics and confusion matrix. The selected models form part of a set of models currently being explored in the construction of hybrid machine learning models beyond the scope of this paper.","PeriodicalId":142887,"journal":{"name":"2022 IST-Africa Conference (IST-Africa)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IST-Africa Conference (IST-Africa)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/IST-Africa56635.2022.9845534","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Artificial intelligence and its related machine learning technologies constantly change how organisations manage their business data in a dynamic environment of ubiquitous data sources and formats. Most organisations face the challenge of selecting the appropriate machine learning models to extract insights from their existing business data, of which datasets may be unstructured, of different forms, types, and sizes. Logistic regression, random forest, and decision tree were the three machine learning models selected for this paper’s preliminary experiments to predict the likelihood of passengers surviving the Titanic disaster. Our investigation revealed that specific models are required to handle specific dataset types, in this case, categorical datasets. It was noted from the findings that a logistic regression model could be highly recommended for use on a categorical dataset based on the speed and high prediction performance obtained in the classification error metrics and confusion matrix. The selected models form part of a set of models currently being explored in the construction of hybrid machine learning models beyond the scope of this paper.