Martín Solís, T. Moreira, R. Gonzalez, Tatiana Fernandez, M. Hernandez
{"title":"Perspectives to Predict Dropout in University Students with Machine Learning","authors":"Martín Solís, T. Moreira, R. Gonzalez, Tatiana Fernandez, M. Hernandez","doi":"10.1109/IWOBI.2018.8464191","DOIUrl":null,"url":null,"abstract":"This study analyzes the performance of four machine learning algorithms with different perspectives for defining data files, in the prediction of university student desertion. The algorithms used were: Random Forest, Neural Networks, Support Vector Machines and Logistic Regression. It was found that the Random Forest algorithm with 10 variables randomly sampled as candidates in each division, was the best for predicting dropouts and that the ideal perspective for training the algorithm is to use information on all semesters that students take within a given period of time, using a classification variable that defines the non-dropout as the graduated student. In a first validation sample, this approach correctly predicted 91% of dropouts, with a sensitivity of 87%.","PeriodicalId":127078,"journal":{"name":"2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWOBI.2018.8464191","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 34
Abstract
This study analyzes the performance of four machine learning algorithms with different perspectives for defining data files, in the prediction of university student desertion. The algorithms used were: Random Forest, Neural Networks, Support Vector Machines and Logistic Regression. It was found that the Random Forest algorithm with 10 variables randomly sampled as candidates in each division, was the best for predicting dropouts and that the ideal perspective for training the algorithm is to use information on all semesters that students take within a given period of time, using a classification variable that defines the non-dropout as the graduated student. In a first validation sample, this approach correctly predicted 91% of dropouts, with a sensitivity of 87%.