机器学习方法在俄语和英语网络媒体中腐败相关内容分类中的应用

E. Artemova, Aleksandr Maksimenko, Dmitriy Ohrimenko
{"title":"机器学习方法在俄语和英语网络媒体中腐败相关内容分类中的应用","authors":"E. Artemova, Aleksandr Maksimenko, Dmitriy Ohrimenko","doi":"10.19181/4m.2021.52.5","DOIUrl":null,"url":null,"abstract":"The paper attempts to classify the corruption-related media content of Russian-language and English-language Internet media using machine learning methods. The methodological approach proposed in the article is very relevant and promising, since, according to our earlier data, corruption monitoring mechanisms used in foreign publications based on the use of advanced information technologies have rather limited potential effectiveness and are not always adequately interpreted. The study shows the principles and grounds for identifying identification parameters, and also describes in detail the layout scheme of the collected news array. In the course of automatic text processing, which took place in 2 stages (vectorization of the text and the use of a learning model), it was possible to solve the main 4 tasks: highlighting a significant quote from a news article to identify a text on corruption topics, predicting the type of news message, predicting a relevant article of the Criminal Code of the Russian Federation, which is used to determine responsibility for the described corruption offense, as well as predicting the type of relationship in corruption offenses. The results obtained showed that modern methods of automatic text processing successfully cope with the tasks of identification and classification of corruption-related content in both Russian and English.","PeriodicalId":271863,"journal":{"name":"Sociology: methodology, methods, mathematical modeling (Sociology: 4M)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Application of machine learning methods in the classification of corruption related content in Russian-speaking and English-speaking Internet media\",\"authors\":\"E. Artemova, Aleksandr Maksimenko, Dmitriy Ohrimenko\",\"doi\":\"10.19181/4m.2021.52.5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper attempts to classify the corruption-related media content of Russian-language and English-language Internet media using machine learning methods. The methodological approach proposed in the article is very relevant and promising, since, according to our earlier data, corruption monitoring mechanisms used in foreign publications based on the use of advanced information technologies have rather limited potential effectiveness and are not always adequately interpreted. The study shows the principles and grounds for identifying identification parameters, and also describes in detail the layout scheme of the collected news array. In the course of automatic text processing, which took place in 2 stages (vectorization of the text and the use of a learning model), it was possible to solve the main 4 tasks: highlighting a significant quote from a news article to identify a text on corruption topics, predicting the type of news message, predicting a relevant article of the Criminal Code of the Russian Federation, which is used to determine responsibility for the described corruption offense, as well as predicting the type of relationship in corruption offenses. The results obtained showed that modern methods of automatic text processing successfully cope with the tasks of identification and classification of corruption-related content in both Russian and English.\",\"PeriodicalId\":271863,\"journal\":{\"name\":\"Sociology: methodology, methods, mathematical modeling (Sociology: 4M)\",\"volume\":\"67 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sociology: methodology, methods, mathematical modeling (Sociology: 4M)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.19181/4m.2021.52.5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sociology: methodology, methods, mathematical modeling (Sociology: 4M)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.19181/4m.2021.52.5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

本文试图使用机器学习方法对俄语和英语互联网媒体的腐败相关媒体内容进行分类。文章中提出的方法方法是非常相关和有希望的,因为根据我们早先的数据,外国出版物中基于使用先进信息技术的腐败监测机制的潜在效力相当有限,而且并不总是得到充分解释。本研究阐述了识别参数的确定原则和依据,并详细描述了新闻采集阵列的布局方案。在文本自动处理过程中,分两个阶段(文本矢量化和学习模型的使用),可以解决主要的4个任务:突出显示新闻文章中的重要引文,以识别有关腐败主题的文本,预测新闻信息的类型,预测俄罗斯联邦刑法的相关条款,用于确定所描述的腐败犯罪的责任,以及预测腐败犯罪中的关系类型。结果表明,现代文本自动处理方法成功地处理了俄语和英语中腐败相关内容的识别和分类任务。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Application of machine learning methods in the classification of corruption related content in Russian-speaking and English-speaking Internet media
The paper attempts to classify the corruption-related media content of Russian-language and English-language Internet media using machine learning methods. The methodological approach proposed in the article is very relevant and promising, since, according to our earlier data, corruption monitoring mechanisms used in foreign publications based on the use of advanced information technologies have rather limited potential effectiveness and are not always adequately interpreted. The study shows the principles and grounds for identifying identification parameters, and also describes in detail the layout scheme of the collected news array. In the course of automatic text processing, which took place in 2 stages (vectorization of the text and the use of a learning model), it was possible to solve the main 4 tasks: highlighting a significant quote from a news article to identify a text on corruption topics, predicting the type of news message, predicting a relevant article of the Criminal Code of the Russian Federation, which is used to determine responsibility for the described corruption offense, as well as predicting the type of relationship in corruption offenses. The results obtained showed that modern methods of automatic text processing successfully cope with the tasks of identification and classification of corruption-related content in both Russian and English.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Qualitative social network analysis in practice: comparison of methods for network maps construction Experience of implementing discourse analysis and conceptual mapping of healthy eating communities Typology of professional trajectories of gifted individuals using neural network analysis Topic modeling for short texts: comparative analysis of algorithms Comparative analysis of the capabilities of WoS and eLibrary for analyzing bibliographic networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1