Assessing the Impact of Expert Labelling of Training Data on the Quality of Automatic Classification of Lithological Groups Using Artificial Neural Networks
Y. Kuchin, R. Mukhamediev, K. Yakunin, J. Grundspeņķis, A. Symagulov
{"title":"Assessing the Impact of Expert Labelling of Training Data on the Quality of Automatic Classification of Lithological Groups Using Artificial Neural Networks","authors":"Y. Kuchin, R. Mukhamediev, K. Yakunin, J. Grundspeņķis, A. Symagulov","doi":"10.2478/acss-2020-0016","DOIUrl":null,"url":null,"abstract":"Abstract Machine learning (ML) methods are nowadays widely used to automate geophysical study. Some of ML algorithms are used to solve lithological classification problems during uranium mining process. One of the key aspects of using classical ML methods is causing data features and estimating their influence on the classification. This paper presents a quantitative assessment of the impact of expert opinions on the classification process. In other words, we have prepared the data, identified the experts and performed a series of experiments with and without taking into account the fact that the expert identifier is supplied to the input of the automatic classifier during training and testing. Feedforward artificial neural network (ANN) has been used as a classifier. The results of the experiments show that the “knowledge” of the ANN of which expert interpreted the data improves the quality of the automatic classification in terms of accuracy (by 5 %) and recall (by 20 %). However, due to the fact that the input parameters of the model may depend on each other, the SHapley Additive exPlanations (SHAP) method has been used to further assess the impact of expert identifier. SHAP has allowed assessing the degree of parameter influence. It has revealed that the expert ID is at least two times more influential than any of the other input parameters of the neural network. This circumstance imposes significant restrictions on the application of ANNs to solve the task of lithological classification at the uranium deposits.","PeriodicalId":41960,"journal":{"name":"Applied Computer Systems","volume":"34 1","pages":"145 - 152"},"PeriodicalIF":0.5000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/acss-2020-0016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 4
Abstract
Abstract Machine learning (ML) methods are nowadays widely used to automate geophysical study. Some of ML algorithms are used to solve lithological classification problems during uranium mining process. One of the key aspects of using classical ML methods is causing data features and estimating their influence on the classification. This paper presents a quantitative assessment of the impact of expert opinions on the classification process. In other words, we have prepared the data, identified the experts and performed a series of experiments with and without taking into account the fact that the expert identifier is supplied to the input of the automatic classifier during training and testing. Feedforward artificial neural network (ANN) has been used as a classifier. The results of the experiments show that the “knowledge” of the ANN of which expert interpreted the data improves the quality of the automatic classification in terms of accuracy (by 5 %) and recall (by 20 %). However, due to the fact that the input parameters of the model may depend on each other, the SHapley Additive exPlanations (SHAP) method has been used to further assess the impact of expert identifier. SHAP has allowed assessing the degree of parameter influence. It has revealed that the expert ID is at least two times more influential than any of the other input parameters of the neural network. This circumstance imposes significant restrictions on the application of ANNs to solve the task of lithological classification at the uranium deposits.