Uncovering deep-rooted cultural differences (UNCOVER)

Defense + Commercial Sensing Pub Date : 2024-06-06 DOI:10.1117/12.3012714

Aleksey Panasyuk, Bryan Li, Christopher Callison-Burch

引用次数: 0

Abstract

This study delves into the interconnected realms of Debates, Fake News, and Propaganda, with an emphasis on discerning prominent ideological underpinnings distinguishing Russian from English authors. Leveraging the advanced capabilities of Large Language Models (LLMs), particularly GPT-4, we process and analyze a large corpus of over 80,000 Wikipedia articles to unearth significant insights. Despite the inherent linguistic distinctions between Russian and English texts, our research highlights the adeptness of LLMs in bridging these variances. Our approach includes translation, question generation and answering, along with emotional analysis, to probe the gathered information. A ranking metric based on the emotional content is used to assess the impact of our approach. Furthermore, our research identifies important limitations within existing data resources for propaganda identification. To address these challenges and foster future research, we present a curated synthetic dataset designed to encompass a diverse spectrum of topics and achieve balance across various propaganda types.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

发现根深蒂固的文化差异（UNCOVER）

本研究深入探讨了 "辩论"、"假新闻 "和 "宣传 "这三个相互关联的领域，重点是辨别俄语作者和英语作者的突出意识形态基础。利用大型语言模型（LLM），特别是 GPT-4 的先进功能，我们处理并分析了维基百科上超过 80,000 篇文章的大型语料库，从而发现了重要的见解。尽管俄语和英语文本之间存在固有的语言差异，但我们的研究凸显了 LLM 在弥合这些差异方面的能力。我们的方法包括翻译、问题生成和回答以及情感分析，以探究收集到的信息。我们使用基于情感内容的排名指标来评估我们方法的影响。此外，我们的研究还发现了现有宣传识别数据资源的重要局限性。为了应对这些挑战并促进未来的研究，我们提出了一个经过精心策划的合成数据集，旨在涵盖各种不同的主题，并实现各种宣传类型之间的平衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Defense + Commercial Sensing

自引率

0.00%

发文量