Classification of texts on emergency situations in Almaty

IF 1.1 Q4 METALLURGY & METALLURGICAL ENGINEERING Kompleksnoe Ispolzovanie Mineralnogo Syra Pub Date : 2023-04-30 DOI:10.31643/2023/6445.36

M.Y. Andirov, Zh.Zh. Assan, S. Nopembri, A. Seilkhan, D.E. Myrzakhmetov

{"title":"Classification of texts on emergency situations in Almaty","authors":"M.Y. Andirov, Zh.Zh. Assan, S. Nopembri, A. Seilkhan, D.E. Myrzakhmetov","doi":"10.31643/2023/6445.36","DOIUrl":null,"url":null,"abstract":"Text classification is a process that includes stages and approaches for the effective classification of texts that are diverse in their structure. In this article, machine learning algorithms are implemented, such as the support vector method, logistic regression, and the k nearest neighborhood method for classifying texts collected from emergency news sites in Almaty. During the experiment, a special role was played by the data collection stage, as well as their subsequent processing. Prior to the classification of the data set, preliminary data processing was performed, which includes such steps as the removal of stop words, tokenization, stemming, lemmatization, feature extraction, and the construction of feature vectors. The data was obtained by automated collection of information from open sources using a script. Experimental results show that the classifier based on logistic regression provides the best performance results compared to other types of algorithms. The performance indicators of each algorithm were obtained, which allows us to perform a comparative analysis between them.","PeriodicalId":29905,"journal":{"name":"Kompleksnoe Ispolzovanie Mineralnogo Syra","volume":"1 1","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Kompleksnoe Ispolzovanie Mineralnogo Syra","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31643/2023/6445.36","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"METALLURGY & METALLURGICAL ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Text classification is a process that includes stages and approaches for the effective classification of texts that are diverse in their structure. In this article, machine learning algorithms are implemented, such as the support vector method, logistic regression, and the k nearest neighborhood method for classifying texts collected from emergency news sites in Almaty. During the experiment, a special role was played by the data collection stage, as well as their subsequent processing. Prior to the classification of the data set, preliminary data processing was performed, which includes such steps as the removal of stop words, tokenization, stemming, lemmatization, feature extraction, and the construction of feature vectors. The data was obtained by automated collection of information from open sources using a script. Experimental results show that the classifier based on logistic regression provides the best performance results compared to other types of algorithms. The performance indicators of each algorithm were obtained, which allows us to perform a comparative analysis between them.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

阿拉木图紧急情况文本分类

文本分类是对结构多样的文本进行有效分类的过程，包括各个阶段和方法。在本文中，实现了机器学习算法，如支持向量法、逻辑回归和k近邻法，用于对阿拉木图紧急新闻站点收集的文本进行分类。在实验过程中，数据采集阶段及其后续处理起着特殊的作用。在对数据集进行分类之前，进行初步的数据处理，包括去除停止词、标记化、词干提取、词形化、特征提取和构造特征向量等步骤。数据是通过使用脚本从开放资源中自动收集信息获得的。实验结果表明，与其他类型的分类器相比，基于逻辑回归的分类器提供了最好的性能结果。得到了每种算法的性能指标，从而可以对它们进行比较分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Kompleksnoe Ispolzovanie Mineralnogo Syra Multiple-

自引率

42.90%

发文量