{"title":"Using advanced machine learning algorithms to predict academic major completion: A cross-sectional study","authors":"Alireza Kordbagheri , Mohammadreza Kordbagheri , Natalie Tayim , Abdulnaser Fakhrou , Mohammadreza Davoudi","doi":"10.1016/j.compbiomed.2024.109372","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Existing prediction methods for academic majors based on personality traits have notable gaps, including limited model complexity and generalizability.The current study aimed to utilize advanced Machine Learning (ML) algorithms with smoothing functions to predict academic majors completed based on personality subscales.</div></div><div><h3>Methods</h3><div>We used reports from 59,413 individuals to perform the current study. All advanced algorithms implemented in this article were based on R software (version 4.1.3, R Core Team, 2021). All model parameters were optimized based on resampling and cross-validation (CV). In addition, pseudo-R<sup>2</sup> as a robust metric has been used to compare the performance of models, which, unlike most studies, considers the quality of model-predicted probabilities.</div></div><div><h3>Result</h3><div>The results indicated that advanced ML models' performance on training and test data was superior to logistic regression. Pseudo-R<sup>2</sup> and AUC results showed that advanced models such as kNN, GBE, and RF had the highest scores based on test data compared to other models. The pseudo-R<sup>2</sup> values for the models used in this study varied across the test dataset; the lowest value belonged to the logistic regression algorithm at .022, and the highest value was recorded for the kNN algorithm at .099. The agreeableness subscale is the most influential component in predicting the completion of university education, followed by conscientiousness and emotional stability.</div></div><div><h3>Conclusion</h3><div>The potential of advanced methods to enhance the accuracy and validity of predictions is a promising development in our field. Their performance, particularly in handling large data sets with complex patterns, is a reason for optimism about the future of research in this area.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"184 ","pages":"Article 109372"},"PeriodicalIF":7.0000,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482524014574","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Existing prediction methods for academic majors based on personality traits have notable gaps, including limited model complexity and generalizability.The current study aimed to utilize advanced Machine Learning (ML) algorithms with smoothing functions to predict academic majors completed based on personality subscales.
Methods
We used reports from 59,413 individuals to perform the current study. All advanced algorithms implemented in this article were based on R software (version 4.1.3, R Core Team, 2021). All model parameters were optimized based on resampling and cross-validation (CV). In addition, pseudo-R2 as a robust metric has been used to compare the performance of models, which, unlike most studies, considers the quality of model-predicted probabilities.
Result
The results indicated that advanced ML models' performance on training and test data was superior to logistic regression. Pseudo-R2 and AUC results showed that advanced models such as kNN, GBE, and RF had the highest scores based on test data compared to other models. The pseudo-R2 values for the models used in this study varied across the test dataset; the lowest value belonged to the logistic regression algorithm at .022, and the highest value was recorded for the kNN algorithm at .099. The agreeableness subscale is the most influential component in predicting the completion of university education, followed by conscientiousness and emotional stability.
Conclusion
The potential of advanced methods to enhance the accuracy and validity of predictions is a promising development in our field. Their performance, particularly in handling large data sets with complex patterns, is a reason for optimism about the future of research in this area.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.