Debunking health fake news with domain specific pre-trained model

Global Transitions Proceedings Pub Date : 2021-11-01 DOI:10.1016/j.gltp.2021.08.038

Santoshi Kumari, Harshitha K Reddy, Chandan S Kulkarni, Vanukuri Gowthami

{"title":"Debunking health fake news with domain specific pre-trained model","authors":"Santoshi Kumari, Harshitha K Reddy, Chandan S Kulkarni, Vanukuri Gowthami","doi":"10.1016/j.gltp.2021.08.038","DOIUrl":null,"url":null,"abstract":"<div><p>During this covid pandemic it is clearer than ever how much health misinformation effects. It is much easier now to publish health related articles online without validation, these articles are shared across social media contributing to the spread of health fake news. This Health fake news are spread with intent to damage image of person or product, to increase sells of a product or to promote a product. In recent research papers, many useful health misinformation detection models use BERT (Bidirectional Encoder Representations from Transformers) which is pretrained on unlabeled data extracted from English Wikipedia and book corpus and are mostly dealt with health misinformation on social media. Therefore, a self - ensemble SCIBERT (Scientific BERT) based model that makes use of domain specific word embeddings is proposed for detection of health misinformation specifically in news which is less explored and a dataset combining existing FakeHealth dataset and custom dataset that contains health articles scraped from news fact checking website Snopes.com. Classification results exhibits that the proposed model provides weighted F1 score of 0.715.</p></div>","PeriodicalId":100588,"journal":{"name":"Global Transitions Proceedings","volume":"2 2","pages":"Pages 267-272"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.gltp.2021.08.038","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Global Transitions Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666285X21000662","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

During this covid pandemic it is clearer than ever how much health misinformation effects. It is much easier now to publish health related articles online without validation, these articles are shared across social media contributing to the spread of health fake news. This Health fake news are spread with intent to damage image of person or product, to increase sells of a product or to promote a product. In recent research papers, many useful health misinformation detection models use BERT (Bidirectional Encoder Representations from Transformers) which is pretrained on unlabeled data extracted from English Wikipedia and book corpus and are mostly dealt with health misinformation on social media. Therefore, a self - ensemble SCIBERT (Scientific BERT) based model that makes use of domain specific word embeddings is proposed for detection of health misinformation specifically in news which is less explored and a dataset combining existing FakeHealth dataset and custom dataset that contains health articles scraped from news fact checking website Snopes.com. Classification results exhibits that the proposed model provides weighted F1 score of 0.715.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

揭穿健康假新闻与领域特定的预训练模型

在这次covid大流行期间，健康错误信息的影响比以往任何时候都更加清楚。现在在网上发布未经验证的健康相关文章要容易得多，这些文章在社交媒体上分享，助长了健康假新闻的传播。这种健康假新闻的传播目的是为了损害个人或产品的形象，增加产品的销售或推广产品。在最近的研究论文中，许多有用的健康错误信息检测模型使用BERT(来自变形金刚的双向编码器表示)，BERT是对从英文维基百科和图书语料库中提取的未标记数据进行预训练的，主要用于处理社交媒体上的健康错误信息。因此，提出了一个基于自集成SCIBERT (Scientific BERT)的模型，该模型利用特定领域的词嵌入来检测新闻中的健康错误信息，特别是在新闻中，这是一个很少被探索的数据集，并结合了现有的FakeHealth数据集和包含从新闻事实检查网站Snopes.com抓取的健康文章的自定义数据集。分类结果表明，该模型的F1加权得分为0.715。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Global Transitions Proceedings

自引率

0.00%

发文量