CrowdChecked:检测社交媒体中先前经过事实核查的言论

Q3 Environmental Science AACL Bioflux Pub Date : 2022-10-10 DOI:10.48550/arXiv.2210.04447

Momchil Hardalov, Anton Chernyavskiy, Ivan Koychev, Dmitry I. Ilvovsky, Preslav Nakov

{"title":"CrowdChecked:检测社交媒体中先前经过事实核查的言论","authors":"Momchil Hardalov, Anton Chernyavskiy, Ivan Koychev, Dmitry I. Ilvovsky, Preslav Nakov","doi":"10.48550/arXiv.2210.04447","DOIUrl":null,"url":null,"abstract":"While there has been substantial progress in developing systems to automate fact-checking, they still lack credibility in the eyes of the users. Thus, an interesting approach has emerged: to perform automatic fact-checking by verifying whether an input claim has been previously fact-checked by professional fact-checkers and to return back an article that explains their decision. This is a sensible approach as people trust manual fact-checking, and as many claims are repeated multiple times. Yet, a major issue when building such systems is the small number of known tweet–verifying article pairs available for training. Here, we aim to bridge this gap by making use of crowd fact-checking, i.e., mining claims in social media for which users have responded with a link to a fact-checking article. In particular, we mine a large-scale collection of 330,000 tweets paired with a corresponding fact-checking article. We further propose an end-to-end framework to learn from this noisy data based on modified self-adaptive training, in a distant supervision scenario. Our experiments on the CLEF’21 CheckThat! test set show improvements over the state of the art by two points absolute. Our code and datasets are available at https://github.com/mhardalov/crowdchecked-claims","PeriodicalId":39298,"journal":{"name":"AACL Bioflux","volume":"49 1","pages":"266-285"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"CrowdChecked: Detecting Previously Fact-Checked Claims in Social Media\",\"authors\":\"Momchil Hardalov, Anton Chernyavskiy, Ivan Koychev, Dmitry I. Ilvovsky, Preslav Nakov\",\"doi\":\"10.48550/arXiv.2210.04447\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While there has been substantial progress in developing systems to automate fact-checking, they still lack credibility in the eyes of the users. Thus, an interesting approach has emerged: to perform automatic fact-checking by verifying whether an input claim has been previously fact-checked by professional fact-checkers and to return back an article that explains their decision. This is a sensible approach as people trust manual fact-checking, and as many claims are repeated multiple times. Yet, a major issue when building such systems is the small number of known tweet–verifying article pairs available for training. Here, we aim to bridge this gap by making use of crowd fact-checking, i.e., mining claims in social media for which users have responded with a link to a fact-checking article. In particular, we mine a large-scale collection of 330,000 tweets paired with a corresponding fact-checking article. We further propose an end-to-end framework to learn from this noisy data based on modified self-adaptive training, in a distant supervision scenario. Our experiments on the CLEF’21 CheckThat! test set show improvements over the state of the art by two points absolute. Our code and datasets are available at https://github.com/mhardalov/crowdchecked-claims\",\"PeriodicalId\":39298,\"journal\":{\"name\":\"AACL Bioflux\",\"volume\":\"49 1\",\"pages\":\"266-285\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AACL Bioflux\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2210.04447\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Environmental Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AACL Bioflux","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2210.04447","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Environmental Science","Score":null,"Total":0}

引用次数: 4

摘要

虽然在开发自动化事实核查系统方面取得了实质性进展，但在用户眼中，它们仍然缺乏可信度。因此，出现了一种有趣的方法:通过验证输入声明先前是否已由专业事实检查员进行事实检查来执行自动事实检查，并返回一篇解释其决定的文章。这是一种明智的方法，因为人们信任人工事实核查，而且许多说法被重复了多次。然而，在构建这样的系统时，一个主要问题是可供训练的已知推文验证文章对数量很少。在这里，我们的目标是通过使用群体事实核查来弥合这一差距，即在社交媒体上挖掘用户回复事实核查文章链接的声明。特别是，我们挖掘了33万条tweet的大规模集合，并与相应的事实核查文章配对。我们进一步提出了一个端到端框架，在远程监督场景中，基于改进的自适应训练从这些噪声数据中学习。我们在CLEF ' 21上的实验测试集显示，与目前的技术水平相比，进步了绝对两点。我们的代码和数据集可在https://github.com/mhardalov/crowdchecked-claims上获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CrowdChecked: Detecting Previously Fact-Checked Claims in Social Media

While there has been substantial progress in developing systems to automate fact-checking, they still lack credibility in the eyes of the users. Thus, an interesting approach has emerged: to perform automatic fact-checking by verifying whether an input claim has been previously fact-checked by professional fact-checkers and to return back an article that explains their decision. This is a sensible approach as people trust manual fact-checking, and as many claims are repeated multiple times. Yet, a major issue when building such systems is the small number of known tweet–verifying article pairs available for training. Here, we aim to bridge this gap by making use of crowd fact-checking, i.e., mining claims in social media for which users have responded with a link to a fact-checking article. In particular, we mine a large-scale collection of 330,000 tweets paired with a corresponding fact-checking article. We further propose an end-to-end framework to learn from this noisy data based on modified self-adaptive training, in a distant supervision scenario. Our experiments on the CLEF’21 CheckThat! test set show improvements over the state of the art by two points absolute. Our code and datasets are available at https://github.com/mhardalov/crowdchecked-claims

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

AACL Bioflux Environmental Science-Management, Monitoring, Policy and Law

CiteScore

1.40

自引率

0.00%

发文量