让假故事走红的词:基于语料库的分析俄罗斯Covid-19虚假信息的方法

IF 0.9 0 LANGUAGE & LINGUISTICS Russian Journal of Linguistics Pub Date : 2023-09-30 DOI:10.22363/2687-0088-33757

Alina G. Monogarova, Tatyana A. Shiryaeva, Elena V. Tikhonova

{"title":"让假故事走红的词:基于语料库的分析俄罗斯Covid-19虚假信息的方法","authors":"Alina G. Monogarova, Tatyana A. Shiryaeva, Elena V. Tikhonova","doi":"10.22363/2687-0088-33757","DOIUrl":null,"url":null,"abstract":"Since the outbreak of the Covid-19 pandemic in 2020, the spread of the new virus has been accompanied by the growing infodemic that became a dangerous prospect for Internet users. Social media and online messengers have been instrumental in making fake stories about Covid-19 viral. The lack of an efficient instrument for classifying digital texts as true or fake is still a big challenge. Deceptive content and its specific characteristics attract attention of many linguists, making it one of the most popular contemporary topics in corpus-based research. This paper explores the language of viral Covid-related fake stories and identifies specific linguistic features that distinguish fake stories from real (authentic) news using quantitative and qualitative approaches to text analysis. The study was conducted on the material of the self-compiled diachronic corpus containing Russian misleading coronavirus-related social media posts (a target corpus of 897 texts) which were virally shared by Russian users through social media platforms and mobile messengers from March 2020 to March 2022 and the reference corpus containing genuine materials about the virus. First, we compared two corpora using an interpretable set of features across language levels to find whether there is evidence of significant variation in the language of fake and real news. Then, we focused on frequency profiling to extract other over-represented groups of words from both corpora. Finally, we analyzed the corresponding contexts to indicate whether these features can be considered as linguistic trends in Russian Covid-related fake story making. Findings regarding the role of these over-represented groups of words in fake narratives about coronavirus revealed efficiency of frequency profiling in indicating lexical patterns of the language of deception.","PeriodicalId":53426,"journal":{"name":"Russian Journal of Linguistics","volume":"41 1","pages":"0"},"PeriodicalIF":0.9000,"publicationDate":"2023-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The words that make fake stories go viral: A corpus-based approach to analyzing Russian Covid-19 disinformation\",\"authors\":\"Alina G. Monogarova, Tatyana A. Shiryaeva, Elena V. Tikhonova\",\"doi\":\"10.22363/2687-0088-33757\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Since the outbreak of the Covid-19 pandemic in 2020, the spread of the new virus has been accompanied by the growing infodemic that became a dangerous prospect for Internet users. Social media and online messengers have been instrumental in making fake stories about Covid-19 viral. The lack of an efficient instrument for classifying digital texts as true or fake is still a big challenge. Deceptive content and its specific characteristics attract attention of many linguists, making it one of the most popular contemporary topics in corpus-based research. This paper explores the language of viral Covid-related fake stories and identifies specific linguistic features that distinguish fake stories from real (authentic) news using quantitative and qualitative approaches to text analysis. The study was conducted on the material of the self-compiled diachronic corpus containing Russian misleading coronavirus-related social media posts (a target corpus of 897 texts) which were virally shared by Russian users through social media platforms and mobile messengers from March 2020 to March 2022 and the reference corpus containing genuine materials about the virus. First, we compared two corpora using an interpretable set of features across language levels to find whether there is evidence of significant variation in the language of fake and real news. Then, we focused on frequency profiling to extract other over-represented groups of words from both corpora. Finally, we analyzed the corresponding contexts to indicate whether these features can be considered as linguistic trends in Russian Covid-related fake story making. Findings regarding the role of these over-represented groups of words in fake narratives about coronavirus revealed efficiency of frequency profiling in indicating lexical patterns of the language of deception.\",\"PeriodicalId\":53426,\"journal\":{\"name\":\"Russian Journal of Linguistics\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2023-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Russian Journal of Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22363/2687-0088-33757\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Russian Journal of Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22363/2687-0088-33757","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}

引用次数: 0

摘要

自2020年新冠肺炎大流行爆发以来，伴随着新冠病毒的传播，信息泛滥日益严重，对互联网用户构成了危险的前景。社交媒体和网络信使在制造关于Covid-19病毒的虚假故事方面发挥了重要作用。缺乏一种有效的工具来区分数字文本的真假仍然是一个巨大的挑战。欺骗内容及其特殊性引起了众多语言学家的关注，是当代语料库研究中最热门的话题之一。本文探讨了与新冠病毒相关的假新闻的语言，并使用定量和定性方法进行文本分析，确定了区分假新闻和真实(真实)新闻的具体语言特征。研究对象是俄罗斯用户在2020年3月至2022年3月期间通过社交媒体平台和移动通讯工具病毒式传播的包含俄罗斯误导性冠状病毒相关社交媒体帖子的自编历时语料库(897个文本的目标语料库)和包含有关该病毒的真实材料的参考语料库。首先，我们使用一组跨语言水平的可解释特征来比较两个语料库，以发现假新闻和真实新闻的语言是否存在显著差异的证据。然后，我们专注于频率分析，以从两个语料库中提取其他过度代表的词组。最后，我们分析了相应的语境，以表明这些特征是否可以被视为俄罗斯与新冠病毒相关的虚假故事制作的语言趋势。关于这些被过度代表的词语组在关于冠状病毒的虚假叙述中的作用的研究结果表明，频率分析在指示欺骗语言的词汇模式方面是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

The words that make fake stories go viral: A corpus-based approach to analyzing Russian Covid-19 disinformation

Since the outbreak of the Covid-19 pandemic in 2020, the spread of the new virus has been accompanied by the growing infodemic that became a dangerous prospect for Internet users. Social media and online messengers have been instrumental in making fake stories about Covid-19 viral. The lack of an efficient instrument for classifying digital texts as true or fake is still a big challenge. Deceptive content and its specific characteristics attract attention of many linguists, making it one of the most popular contemporary topics in corpus-based research. This paper explores the language of viral Covid-related fake stories and identifies specific linguistic features that distinguish fake stories from real (authentic) news using quantitative and qualitative approaches to text analysis. The study was conducted on the material of the self-compiled diachronic corpus containing Russian misleading coronavirus-related social media posts (a target corpus of 897 texts) which were virally shared by Russian users through social media platforms and mobile messengers from March 2020 to March 2022 and the reference corpus containing genuine materials about the virus. First, we compared two corpora using an interpretable set of features across language levels to find whether there is evidence of significant variation in the language of fake and real news. Then, we focused on frequency profiling to extract other over-represented groups of words from both corpora. Finally, we analyzed the corresponding contexts to indicate whether these features can be considered as linguistic trends in Russian Covid-related fake story making. Findings regarding the role of these over-represented groups of words in fake narratives about coronavirus revealed efficiency of frequency profiling in indicating lexical patterns of the language of deception.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊