Debunking health fake news with domain specific pre-trained model

Santoshi Kumari, Harshitha K Reddy, Chandan S Kulkarni, Vanukuri Gowthami
{"title":"Debunking health fake news with domain specific pre-trained model","authors":"Santoshi Kumari,&nbsp;Harshitha K Reddy,&nbsp;Chandan S Kulkarni,&nbsp;Vanukuri Gowthami","doi":"10.1016/j.gltp.2021.08.038","DOIUrl":null,"url":null,"abstract":"<div><p>During this covid pandemic it is clearer than ever how much health misinformation effects. It is much easier now to publish health related articles online without validation, these articles are shared across social media contributing to the spread of health fake news. This Health fake news are spread with intent to damage image of person or product, to increase sells of a product or to promote a product. In recent research papers, many useful health misinformation detection models use BERT (Bidirectional Encoder Representations from Transformers) which is pretrained on unlabeled data extracted from English Wikipedia and book corpus and are mostly dealt with health misinformation on social media. Therefore, a self - ensemble SCIBERT (Scientific BERT) based model that makes use of domain specific word embeddings is proposed for detection of health misinformation specifically in news which is less explored and a dataset combining existing FakeHealth dataset and custom dataset that contains health articles scraped from news fact checking website Snopes.com. Classification results exhibits that the proposed model provides weighted F1 score of 0.715.</p></div>","PeriodicalId":100588,"journal":{"name":"Global Transitions Proceedings","volume":"2 2","pages":"Pages 267-272"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.gltp.2021.08.038","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Global Transitions Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666285X21000662","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

During this covid pandemic it is clearer than ever how much health misinformation effects. It is much easier now to publish health related articles online without validation, these articles are shared across social media contributing to the spread of health fake news. This Health fake news are spread with intent to damage image of person or product, to increase sells of a product or to promote a product. In recent research papers, many useful health misinformation detection models use BERT (Bidirectional Encoder Representations from Transformers) which is pretrained on unlabeled data extracted from English Wikipedia and book corpus and are mostly dealt with health misinformation on social media. Therefore, a self - ensemble SCIBERT (Scientific BERT) based model that makes use of domain specific word embeddings is proposed for detection of health misinformation specifically in news which is less explored and a dataset combining existing FakeHealth dataset and custom dataset that contains health articles scraped from news fact checking website Snopes.com. Classification results exhibits that the proposed model provides weighted F1 score of 0.715.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
揭穿健康假新闻与领域特定的预训练模型
在这次covid大流行期间,健康错误信息的影响比以往任何时候都更加清楚。现在在网上发布未经验证的健康相关文章要容易得多,这些文章在社交媒体上分享,助长了健康假新闻的传播。这种健康假新闻的传播目的是为了损害个人或产品的形象,增加产品的销售或推广产品。在最近的研究论文中,许多有用的健康错误信息检测模型使用BERT(来自变形金刚的双向编码器表示),BERT是对从英文维基百科和图书语料库中提取的未标记数据进行预训练的,主要用于处理社交媒体上的健康错误信息。因此,提出了一个基于自集成SCIBERT (Scientific BERT)的模型,该模型利用特定领域的词嵌入来检测新闻中的健康错误信息,特别是在新闻中,这是一个很少被探索的数据集,并结合了现有的FakeHealth数据集和包含从新闻事实检查网站Snopes.com抓取的健康文章的自定义数据集。分类结果表明,该模型的F1加权得分为0.715。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Enhanced Energy Efficient Secure Routing Protocol for Mobile Ad-Hoc Network Grid interconnected H-bridge multilevel inverter for renewable power applications using repeating units and level boosting network Power Generation Using Ocean Waves: A Review Development of an Arabic HQAS-based ASAG to consider an ignored knowledge in misspelled multiple words short answers Smartphone assist deep neural network to detect the citrus diseases in Agri-informatics
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1