Transfer Learning of Pre-trained Transformers for Covid-19 Hoax Detection in Indonesian Language

Lya Hulliyyatus Suadaa, Ibnu Santoso, Amanda Tabitha Bulan Panjaitan
{"title":"Transfer Learning of Pre-trained Transformers for Covid-19 Hoax Detection in Indonesian Language","authors":"Lya Hulliyyatus Suadaa, Ibnu Santoso, Amanda Tabitha Bulan Panjaitan","doi":"10.22146/IJCCS.66205","DOIUrl":null,"url":null,"abstract":"Nowadays, internet has become the most popular source of news. However, the validity of the online news articles is difficult to assess, whether it is a fact or a hoax. Hoaxes related to Covid-19 brought a problematic effect to human life. An accurate hoax detection system is important to filter abundant information on the internet.  In this research, a Covid-19 hoax detection system was proposed by transfer learning of pre-trained transformer models. Fine-tuned original pre-trained BERT, multilingual pre-trained mBERT, and monolingual pre-trained IndoBERT were used to solve the classification task in the hoax detection system. Based on the experimental results, fine-tuned IndoBERT models trained on monolingual Indonesian corpus outperform fine-tuned original and multilingual BERT with uncased versions. However, the fine-tuned mBERT cased model trained on a larger corpus achieved the best performance.","PeriodicalId":31625,"journal":{"name":"IJCCS Indonesian Journal of Computing and Cybernetics Systems","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IJCCS Indonesian Journal of Computing and Cybernetics Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22146/IJCCS.66205","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Nowadays, internet has become the most popular source of news. However, the validity of the online news articles is difficult to assess, whether it is a fact or a hoax. Hoaxes related to Covid-19 brought a problematic effect to human life. An accurate hoax detection system is important to filter abundant information on the internet.  In this research, a Covid-19 hoax detection system was proposed by transfer learning of pre-trained transformer models. Fine-tuned original pre-trained BERT, multilingual pre-trained mBERT, and monolingual pre-trained IndoBERT were used to solve the classification task in the hoax detection system. Based on the experimental results, fine-tuned IndoBERT models trained on monolingual Indonesian corpus outperform fine-tuned original and multilingual BERT with uncased versions. However, the fine-tuned mBERT cased model trained on a larger corpus achieved the best performance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
预训练变压器的迁移学习用于印度尼西亚语的Covid-19恶作剧检测
如今,互联网已经成为最受欢迎的新闻来源。然而,无论是事实还是骗局,网络新闻文章的有效性都很难评估。与新冠肺炎有关的骗局给人类生活带来了问题影响。一个准确的恶作剧检测系统对于过滤互联网上丰富的信息非常重要。在本研究中,通过预先训练的变压器模型的迁移学习,提出了一种新冠肺炎恶作剧检测系统。使用微调的原始预训练BERT、多语言预训练mBERT和单语预训练IndoBERT来解决恶作剧检测系统中的分类任务。基于实验结果,在单语印尼语语料库上训练的微调IndoBERT模型优于未封顶版本的微调原始和多语言BERT。然而,在更大的语料库上训练的微调mBERT案例模型获得了最佳性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
20
审稿时长
12 weeks
期刊最新文献
Identify Reviews of Pedulilindungi Applications using Topic Modeling with Latent Dirichlet Allocation Method Convolutional Long Short-Term Memory (C-LSTM) For Multi Product Prediction Optimizing ODP Device Placement on FTTH Network Using Genetic Algorithms Backward Elimination for Feature Selection on Breast Cancer Classification Using Logistic Regression and Support Vector Machine Algorithms ESSAY ANSWER CLASSIFICATION WITH SMOTE RANDOM FOREST AND ADABOOST IN AUTOMATED ESSAY SCORING
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1