Aleksey Panasyuk, Bryan Li, Christopher Callison-Burch
{"title":"Uncovering deep-rooted cultural differences (UNCOVER)","authors":"Aleksey Panasyuk, Bryan Li, Christopher Callison-Burch","doi":"10.1117/12.3012714","DOIUrl":null,"url":null,"abstract":"This study delves into the interconnected realms of Debates, Fake News, and Propaganda, with an emphasis on discerning prominent ideological underpinnings distinguishing Russian from English authors. Leveraging the advanced capabilities of Large Language Models (LLMs), particularly GPT-4, we process and analyze a large corpus of over 80,000 Wikipedia articles to unearth significant insights. Despite the inherent linguistic distinctions between Russian and English texts, our research highlights the adeptness of LLMs in bridging these variances. Our approach includes translation, question generation and answering, along with emotional analysis, to probe the gathered information. A ranking metric based on the emotional content is used to assess the impact of our approach. Furthermore, our research identifies important limitations within existing data resources for propaganda identification. To address these challenges and foster future research, we present a curated synthetic dataset designed to encompass a diverse spectrum of topics and achieve balance across various propaganda types.","PeriodicalId":178341,"journal":{"name":"Defense + Commercial Sensing","volume":"89 1","pages":"130580Z - 130580Z-23"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Defense + Commercial Sensing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3012714","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This study delves into the interconnected realms of Debates, Fake News, and Propaganda, with an emphasis on discerning prominent ideological underpinnings distinguishing Russian from English authors. Leveraging the advanced capabilities of Large Language Models (LLMs), particularly GPT-4, we process and analyze a large corpus of over 80,000 Wikipedia articles to unearth significant insights. Despite the inherent linguistic distinctions between Russian and English texts, our research highlights the adeptness of LLMs in bridging these variances. Our approach includes translation, question generation and answering, along with emotional analysis, to probe the gathered information. A ranking metric based on the emotional content is used to assess the impact of our approach. Furthermore, our research identifies important limitations within existing data resources for propaganda identification. To address these challenges and foster future research, we present a curated synthetic dataset designed to encompass a diverse spectrum of topics and achieve balance across various propaganda types.