Data Augmentations to Improve BERT-based Detection of Covid-19 Fake News on Twitter

Feby Dahlan, S. Suyanto
{"title":"Data Augmentations to Improve BERT-based Detection of Covid-19 Fake News on Twitter","authors":"Feby Dahlan, S. Suyanto","doi":"10.1109/ICCoSITE57641.2023.10127796","DOIUrl":null,"url":null,"abstract":"Since Covid-19 has attacked the entire world, news about Covid-19 has been shared to reduce the impact of this outbreak. Social media, particularly Twitter, is a reliable source of information exchange. However, Covid-19 fake news is also being spread by irresponsible people to the public. This fact is so harmful to all parties. Hence, a fake news detector is required to tackle the problem. In this research, a Transformer-based fake news detection system is created. First, an architecture is designed using the Bidirectional Encoder Representations from Transformers (BERT). Three augmentation methods namely spell-checking-based, acronym-based, and typography-based augmentations are then developed to improve the BERT model. A comprehensive examination is performed based on 5-fold cross-validation using eleven thousand Twitter posts with four metrics: Accuracy, Precision, Recall, and F1-Score. Experimental results indicate that those three proposed augmentation methods can increase the BERT's performance detecting fake news related to Covid-19. The acronym-based augmentation gives a low improvement. Next, the spell-checking-based one provides a medium enhancement. Finally, the typography-based one offers the most significant improvement.","PeriodicalId":256184,"journal":{"name":"2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCoSITE57641.2023.10127796","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Since Covid-19 has attacked the entire world, news about Covid-19 has been shared to reduce the impact of this outbreak. Social media, particularly Twitter, is a reliable source of information exchange. However, Covid-19 fake news is also being spread by irresponsible people to the public. This fact is so harmful to all parties. Hence, a fake news detector is required to tackle the problem. In this research, a Transformer-based fake news detection system is created. First, an architecture is designed using the Bidirectional Encoder Representations from Transformers (BERT). Three augmentation methods namely spell-checking-based, acronym-based, and typography-based augmentations are then developed to improve the BERT model. A comprehensive examination is performed based on 5-fold cross-validation using eleven thousand Twitter posts with four metrics: Accuracy, Precision, Recall, and F1-Score. Experimental results indicate that those three proposed augmentation methods can increase the BERT's performance detecting fake news related to Covid-19. The acronym-based augmentation gives a low improvement. Next, the spell-checking-based one provides a medium enhancement. Finally, the typography-based one offers the most significant improvement.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
增强数据以改进基于bert的推特上Covid-19假新闻检测
自Covid-19袭击全世界以来,有关Covid-19的新闻一直在分享,以减少这次疫情的影响。社交媒体,尤其是Twitter,是信息交流的可靠来源。然而,一些不负责任的人也在向公众传播新冠假新闻。这一事实对各方都是有害的。因此,需要一个假新闻检测器来解决这个问题。在本研究中,创建了一个基于transformer的假新闻检测系统。首先,利用双向编码器表示从变压器(BERT)设计了一个体系结构。然后开发了三种增强方法,即基于拼写检查的增强、基于缩写的增强和基于排版的增强,以改进BERT模型。综合检查基于5倍交叉验证,使用11,000个Twitter帖子,具有四个指标:准确性,精度,召回率和F1-Score。实验结果表明,这三种增强方法都可以提高BERT对Covid-19相关假新闻的检测性能。基于缩略词的增强提供了一个低的改进。接下来,基于拼写检查的版本提供了中等程度的增强。最后,基于排版的版本提供了最显著的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Customer Relationship Management, Customer Retention, and the Mediating Role of Customer Satisfaction on a Healthcare Mobile Applications Revalidating the Encoder-Decoder Depths and Activation Function to Find Optimum Vanilla Transformer Model Goertzel Algorithm Design on Field Programmable Gate Arrays For Implementing Electric Power Measurement Instagram vs TikTok: Which Engage Best for Consumer Brand Engagement for Social Commerce and Purchase Intention? Air Pollution Prediction using Random Forest Classifier: A Case Study of DKI Jakarta
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1