{"title":"Automatic Classification of University Staff Enquiries in Russian and English","authors":"Abylay Omar, S. Kadyrov, Yerbol Baigarayev","doi":"10.1109/icecco53203.2021.9663851","DOIUrl":null,"url":null,"abstract":"Document or text classification is a typical task in supervised machine learning. In this study we consider a multi-label text classification problem of helpdesk enquiries made by a university staff. To this end, we collect our data and consider the enquiries made in either Russian or in English. The dataset is categorized into eight different labels and underwent a preprocessing stage. A classical Term Frequency-Inverse Document Frequency algorithm is applied to the preprocessed data for feature extraction. For classification and prediction the Support Vector Machine and Multinomial Naive Bayes algorithms were utilized and the findings of experiments were compared. The experimental results show that in both languages, Support Vector Machine algorithm outperforms.","PeriodicalId":331369,"journal":{"name":"2021 16th International Conference on Electronics Computer and Computation (ICECCO)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 16th International Conference on Electronics Computer and Computation (ICECCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icecco53203.2021.9663851","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Document or text classification is a typical task in supervised machine learning. In this study we consider a multi-label text classification problem of helpdesk enquiries made by a university staff. To this end, we collect our data and consider the enquiries made in either Russian or in English. The dataset is categorized into eight different labels and underwent a preprocessing stage. A classical Term Frequency-Inverse Document Frequency algorithm is applied to the preprocessed data for feature extraction. For classification and prediction the Support Vector Machine and Multinomial Naive Bayes algorithms were utilized and the findings of experiments were compared. The experimental results show that in both languages, Support Vector Machine algorithm outperforms.