Entity-centric multi-domain transformer for improving generalization in fake news detection

IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Processing & Management Pub Date : 2024-06-14 DOI:10.1016/j.ipm.2024.103807
Parisa Bazmi , Masoud Asadpour , Azadeh Shakery , Abbas Maazallahi
{"title":"Entity-centric multi-domain transformer for improving generalization in fake news detection","authors":"Parisa Bazmi ,&nbsp;Masoud Asadpour ,&nbsp;Azadeh Shakery ,&nbsp;Abbas Maazallahi","doi":"10.1016/j.ipm.2024.103807","DOIUrl":null,"url":null,"abstract":"<div><p>Fake news has become a significant concern in recent times, particularly during the COVID-19 pandemic, as spreading false information can pose significant public health risks. Although many models have been suggested to detect fake news, they are often limited in their ability to extend to new emerging domains since they are designed for a single domain. Previous studies on multidomain fake news detection have focused on developing models that can perform well on multiple domains, but they often lack the ability to generalize to new unseen domains, which limits their effectiveness. To overcome this limitation, in this paper, we propose the Entity-centric Multi-domain Transformer (EMT) model. EMT uses entities in the news as key components in learning domain-invariant and domain-specific news representations, which addresses the challenges of domain shift and incomplete domain labeling in multidomain fake news detection. It incorporates entity background information from external knowledge sources to enhance fine-grained news domain representation. EMT consists of a Domain-Invariant (DI) encoder, a Domain-Specific (DS) encoder, and a Cross-Domain Transformer (CT) that facilitates investigation of domain relationships and knowledge interaction with input news, enabling effective generalization. We evaluate the EMT's performance in multi-domain fake news detection across three settings: supervised multi-domain, zero-shot setting on new unseen domain, and limited samples from new domain. EMT demonstrates greater stability than state-of-the-art models when dealing with domain changes and varying training data. Specifically, in the zero-shot setting on new unseen domains, EMT achieves a good F1 score of approximately 72 %. The results highlight the effectiveness of EMT's entity-centric approach and its potential for real-world applications, as it demonstrates the ability to adapt to various training settings and outperform existing models in handling limited label data and adapting to previously unseen domains.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4000,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324001663","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Fake news has become a significant concern in recent times, particularly during the COVID-19 pandemic, as spreading false information can pose significant public health risks. Although many models have been suggested to detect fake news, they are often limited in their ability to extend to new emerging domains since they are designed for a single domain. Previous studies on multidomain fake news detection have focused on developing models that can perform well on multiple domains, but they often lack the ability to generalize to new unseen domains, which limits their effectiveness. To overcome this limitation, in this paper, we propose the Entity-centric Multi-domain Transformer (EMT) model. EMT uses entities in the news as key components in learning domain-invariant and domain-specific news representations, which addresses the challenges of domain shift and incomplete domain labeling in multidomain fake news detection. It incorporates entity background information from external knowledge sources to enhance fine-grained news domain representation. EMT consists of a Domain-Invariant (DI) encoder, a Domain-Specific (DS) encoder, and a Cross-Domain Transformer (CT) that facilitates investigation of domain relationships and knowledge interaction with input news, enabling effective generalization. We evaluate the EMT's performance in multi-domain fake news detection across three settings: supervised multi-domain, zero-shot setting on new unseen domain, and limited samples from new domain. EMT demonstrates greater stability than state-of-the-art models when dealing with domain changes and varying training data. Specifically, in the zero-shot setting on new unseen domains, EMT achieves a good F1 score of approximately 72 %. The results highlight the effectiveness of EMT's entity-centric approach and its potential for real-world applications, as it demonstrates the ability to adapt to various training settings and outperform existing models in handling limited label data and adapting to previously unseen domains.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
以实体为中心的多域转换器,用于提高假新闻检测的概括能力
近来,特别是在 COVID-19 大流行期间,虚假新闻已成为人们关注的一个重要问题,因为传播虚假信息会给公共健康带来重大风险。虽然已经提出了许多检测假新闻的模型,但由于这些模型是针对单一领域设计的,因此它们扩展到新兴领域的能力往往受到限制。以往关于多领域假新闻检测的研究主要集中在开发能在多个领域表现良好的模型上,但这些模型往往缺乏向新的未见领域推广的能力,从而限制了其有效性。为了克服这一局限,我们在本文中提出了以实体为中心的多域转换器(EMT)模型。EMT 将新闻中的实体作为学习领域不变和特定领域新闻表征的关键组件,从而解决了多领域虚假新闻检测中领域转移和领域标记不完整的难题。它结合了来自外部知识源的实体背景信息,以增强细粒度的新闻领域表征。EMT 由领域不变(DI)编码器、特定领域(DS)编码器和跨领域转换器(CT)组成,有助于研究领域关系以及与输入新闻之间的知识交互,从而实现有效的泛化。我们对 EMT 在多领域假新闻检测中的性能进行了评估,包括三种情况:有监督的多领域检测、在未见过的新领域进行零检测以及来自新领域的有限样本检测。与最先进的模型相比,EMT 在处理领域变化和不同训练数据时表现出更高的稳定性。具体来说,在新的未见领域的 "0-shot "设置中,EMT 取得了约 72% 的良好 F1 分数。这些结果凸显了 EMT 以实体为中心的方法的有效性及其在实际应用中的潜力,因为它展示了适应各种训练设置的能力,并在处理有限的标签数据和适应以前未见过的领域方面优于现有模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Information Processing & Management
Information Processing & Management 工程技术-计算机:信息系统
CiteScore
17.00
自引率
11.60%
发文量
276
审稿时长
39 days
期刊介绍: Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.
期刊最新文献
Fusing temporal and semantic dependencies for session-based recommendation A Universal Adaptive Algorithm for Graph Anomaly Detection A context-aware attention and graph neural network-based multimodal framework for misogyny detection Multi-granularity contrastive zero-shot learning model based on attribute decomposition Asymmetric augmented paradigm-based graph neural architecture search
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1