{"title":"Interpretability Analysis of Academic Achievement Prediction Based on Machine Learning","authors":"Jie Yang, Hong Wang","doi":"10.1109/ITME53901.2021.00101","DOIUrl":null,"url":null,"abstract":"In recent years, with the development of artificial intelligence and information technology, we are gradually stepping into the era of big data, in which education-related data has developed sufficiently in terms of quantity and content. To be able to use machine learning techniques to assist educators to help improve the current quality of education and teaching, more and more researchers have started to data-mine educational data. In this paper, various algorithms of machine learning are applied to the field of education to process the data of students' teaching performance and then model it using various algorithms of machine learning to predict the students' performance and provide some suggestions to the teachers to improve the students' performance. The main contributions of this paper are as follows: Firstly, this paper carries out necessary preprocessing operations on the original data to remove some dirty data or missing data. Then, a variety of machine learning algorithms are used to model students' academic performance. By comparing the prediction accuracy, recall rate, and F1 score of the model, the Gradient Boosting Decision Tree Classifier is finally obtained as the optimal model. We then integrated the three best machine learning models as the base models and proposed a new Stacking learning method with better results. Finally, this paper analyzes the interpretability of the Gradient Boosting Decision Tree Classifier, evaluates the importance of different characteristics, and finally concludes that “Visited resources”, “Raised hand”, “Student Absence Days”, and “Viewing announcements” are the most important factors affecting students' performance. This model has an advanced effect and good interpretability.","PeriodicalId":6774,"journal":{"name":"2021 11th International Conference on Information Technology in Medicine and Education (ITME)","volume":"71 1","pages":"475-479"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 11th International Conference on Information Technology in Medicine and Education (ITME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITME53901.2021.00101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, with the development of artificial intelligence and information technology, we are gradually stepping into the era of big data, in which education-related data has developed sufficiently in terms of quantity and content. To be able to use machine learning techniques to assist educators to help improve the current quality of education and teaching, more and more researchers have started to data-mine educational data. In this paper, various algorithms of machine learning are applied to the field of education to process the data of students' teaching performance and then model it using various algorithms of machine learning to predict the students' performance and provide some suggestions to the teachers to improve the students' performance. The main contributions of this paper are as follows: Firstly, this paper carries out necessary preprocessing operations on the original data to remove some dirty data or missing data. Then, a variety of machine learning algorithms are used to model students' academic performance. By comparing the prediction accuracy, recall rate, and F1 score of the model, the Gradient Boosting Decision Tree Classifier is finally obtained as the optimal model. We then integrated the three best machine learning models as the base models and proposed a new Stacking learning method with better results. Finally, this paper analyzes the interpretability of the Gradient Boosting Decision Tree Classifier, evaluates the importance of different characteristics, and finally concludes that “Visited resources”, “Raised hand”, “Student Absence Days”, and “Viewing announcements” are the most important factors affecting students' performance. This model has an advanced effect and good interpretability.