M.Y. Andirov, Zh.Zh. Assan, S. Nopembri, A. Seilkhan, D.E. Myrzakhmetov
{"title":"Classification of texts on emergency situations in Almaty","authors":"M.Y. Andirov, Zh.Zh. Assan, S. Nopembri, A. Seilkhan, D.E. Myrzakhmetov","doi":"10.31643/2023/6445.36","DOIUrl":null,"url":null,"abstract":"Text classification is a process that includes stages and approaches for the effective classification of texts that are diverse in their structure. In this article, machine learning algorithms are implemented, such as the support vector method, logistic regression, and the k nearest neighborhood method for classifying texts collected from emergency news sites in Almaty. During the experiment, a special role was played by the data collection stage, as well as their subsequent processing. Prior to the classification of the data set, preliminary data processing was performed, which includes such steps as the removal of stop words, tokenization, stemming, lemmatization, feature extraction, and the construction of feature vectors. The data was obtained by automated collection of information from open sources using a script. Experimental results show that the classifier based on logistic regression provides the best performance results compared to other types of algorithms. The performance indicators of each algorithm were obtained, which allows us to perform a comparative analysis between them.","PeriodicalId":29905,"journal":{"name":"Kompleksnoe Ispolzovanie Mineralnogo Syra","volume":"1 1","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Kompleksnoe Ispolzovanie Mineralnogo Syra","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31643/2023/6445.36","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"METALLURGY & METALLURGICAL ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Text classification is a process that includes stages and approaches for the effective classification of texts that are diverse in their structure. In this article, machine learning algorithms are implemented, such as the support vector method, logistic regression, and the k nearest neighborhood method for classifying texts collected from emergency news sites in Almaty. During the experiment, a special role was played by the data collection stage, as well as their subsequent processing. Prior to the classification of the data set, preliminary data processing was performed, which includes such steps as the removal of stop words, tokenization, stemming, lemmatization, feature extraction, and the construction of feature vectors. The data was obtained by automated collection of information from open sources using a script. Experimental results show that the classifier based on logistic regression provides the best performance results compared to other types of algorithms. The performance indicators of each algorithm were obtained, which allows us to perform a comparative analysis between them.