发现根深蒂固的文化差异(UNCOVER)

Aleksey Panasyuk, Bryan Li, Christopher Callison-Burch
{"title":"发现根深蒂固的文化差异(UNCOVER)","authors":"Aleksey Panasyuk, Bryan Li, Christopher Callison-Burch","doi":"10.1117/12.3012714","DOIUrl":null,"url":null,"abstract":"This study delves into the interconnected realms of Debates, Fake News, and Propaganda, with an emphasis on discerning prominent ideological underpinnings distinguishing Russian from English authors. Leveraging the advanced capabilities of Large Language Models (LLMs), particularly GPT-4, we process and analyze a large corpus of over 80,000 Wikipedia articles to unearth significant insights. Despite the inherent linguistic distinctions between Russian and English texts, our research highlights the adeptness of LLMs in bridging these variances. Our approach includes translation, question generation and answering, along with emotional analysis, to probe the gathered information. A ranking metric based on the emotional content is used to assess the impact of our approach. Furthermore, our research identifies important limitations within existing data resources for propaganda identification. To address these challenges and foster future research, we present a curated synthetic dataset designed to encompass a diverse spectrum of topics and achieve balance across various propaganda types.","PeriodicalId":178341,"journal":{"name":"Defense + Commercial Sensing","volume":"89 1","pages":"130580Z - 130580Z-23"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Uncovering deep-rooted cultural differences (UNCOVER)\",\"authors\":\"Aleksey Panasyuk, Bryan Li, Christopher Callison-Burch\",\"doi\":\"10.1117/12.3012714\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study delves into the interconnected realms of Debates, Fake News, and Propaganda, with an emphasis on discerning prominent ideological underpinnings distinguishing Russian from English authors. Leveraging the advanced capabilities of Large Language Models (LLMs), particularly GPT-4, we process and analyze a large corpus of over 80,000 Wikipedia articles to unearth significant insights. Despite the inherent linguistic distinctions between Russian and English texts, our research highlights the adeptness of LLMs in bridging these variances. Our approach includes translation, question generation and answering, along with emotional analysis, to probe the gathered information. A ranking metric based on the emotional content is used to assess the impact of our approach. Furthermore, our research identifies important limitations within existing data resources for propaganda identification. To address these challenges and foster future research, we present a curated synthetic dataset designed to encompass a diverse spectrum of topics and achieve balance across various propaganda types.\",\"PeriodicalId\":178341,\"journal\":{\"name\":\"Defense + Commercial Sensing\",\"volume\":\"89 1\",\"pages\":\"130580Z - 130580Z-23\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Defense + Commercial Sensing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.3012714\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Defense + Commercial Sensing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3012714","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本研究深入探讨了 "辩论"、"假新闻 "和 "宣传 "这三个相互关联的领域,重点是辨别俄语作者和英语作者的突出意识形态基础。利用大型语言模型(LLM),特别是 GPT-4 的先进功能,我们处理并分析了维基百科上超过 80,000 篇文章的大型语料库,从而发现了重要的见解。尽管俄语和英语文本之间存在固有的语言差异,但我们的研究凸显了 LLM 在弥合这些差异方面的能力。我们的方法包括翻译、问题生成和回答以及情感分析,以探究收集到的信息。我们使用基于情感内容的排名指标来评估我们方法的影响。此外,我们的研究还发现了现有宣传识别数据资源的重要局限性。为了应对这些挑战并促进未来的研究,我们提出了一个经过精心策划的合成数据集,旨在涵盖各种不同的主题,并实现各种宣传类型之间的平衡。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Uncovering deep-rooted cultural differences (UNCOVER)
This study delves into the interconnected realms of Debates, Fake News, and Propaganda, with an emphasis on discerning prominent ideological underpinnings distinguishing Russian from English authors. Leveraging the advanced capabilities of Large Language Models (LLMs), particularly GPT-4, we process and analyze a large corpus of over 80,000 Wikipedia articles to unearth significant insights. Despite the inherent linguistic distinctions between Russian and English texts, our research highlights the adeptness of LLMs in bridging these variances. Our approach includes translation, question generation and answering, along with emotional analysis, to probe the gathered information. A ranking metric based on the emotional content is used to assess the impact of our approach. Furthermore, our research identifies important limitations within existing data resources for propaganda identification. To address these challenges and foster future research, we present a curated synthetic dataset designed to encompass a diverse spectrum of topics and achieve balance across various propaganda types.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Enhanced robot state estimation using physics-informed neural networks and multimodal proprioceptive data Exploring MOF-based micromotors as SERS sensors Adaptive object detection algorithms for resource constrained autonomous robotic systems Adaptive SIF-EKF estimation for fault detection in attitude control experiments A homogeneous low-resolution face recognition method using correlation features at the edge
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1