LimeSoda: Dataset for Fake News Detection in Healthcare Domain

Patomporn Payoungkhamdee, Peerachet Porkaew, Atthasith Sinthunyathum, Phattharaphon Songphum, Witsarut Kawidam, Wichayut Loha-Udom, P. Boonkwan, Vipas Sutantayawalee
{"title":"LimeSoda: Dataset for Fake News Detection in Healthcare Domain","authors":"Patomporn Payoungkhamdee, Peerachet Porkaew, Atthasith Sinthunyathum, Phattharaphon Songphum, Witsarut Kawidam, Wichayut Loha-Udom, P. Boonkwan, Vipas Sutantayawalee","doi":"10.1109/iSAI-NLP54397.2021.9678187","DOIUrl":null,"url":null,"abstract":"In this paper, we present our Thai fake news dataset in the healthcare domain, LIMESODA, with the construction guideline. Each document in the dataset is classified as fact, fake, or undefined. Moreover, we also provide token-level annotations for validating classifier decisions. Five high-level annotation tags1 are 1) misleading headline 2) imposter 3) fabrication 4) false connection and 5) misleading content. We curate and manually annotated 7,191 documents with these tags. We evaluate our dataset with two deep learning approaches; RNN and Transformer baselines and analyzed token-level contributions to understand model behaviors. For the RNN model, we use the attention weights as token-level contributions. For Transformer models, we use the integrated gradient method at the embedding layers. We finally compared these token-level contributions with human annotations. Although our baseline models yield promising performances, we found that tokens that support model decisions are quite different from human annotation.","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678187","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

In this paper, we present our Thai fake news dataset in the healthcare domain, LIMESODA, with the construction guideline. Each document in the dataset is classified as fact, fake, or undefined. Moreover, we also provide token-level annotations for validating classifier decisions. Five high-level annotation tags1 are 1) misleading headline 2) imposter 3) fabrication 4) false connection and 5) misleading content. We curate and manually annotated 7,191 documents with these tags. We evaluate our dataset with two deep learning approaches; RNN and Transformer baselines and analyzed token-level contributions to understand model behaviors. For the RNN model, we use the attention weights as token-level contributions. For Transformer models, we use the integrated gradient method at the embedding layers. We finally compared these token-level contributions with human annotations. Although our baseline models yield promising performances, we found that tokens that support model decisions are quite different from human annotation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
LimeSoda:医疗保健领域假新闻检测数据集
在本文中,我们提出了我们在医疗保健领域的泰国假新闻数据集,LIMESODA,以及构建指南。数据集中的每个文档被分类为事实、虚假或未定义。此外,我们还提供了用于验证分类器决策的令牌级注释。五个高级注释标签1)误导性标题2)冒名顶替者3)捏造4)虚假联系5)误导性内容。我们用这些标签整理和手动注释了7191个文档。我们用两种深度学习方法来评估我们的数据集;RNN和Transformer基线,并分析了令牌级别的贡献,以理解模型行为。对于RNN模型,我们使用注意力权重作为令牌级别的贡献。对于Transformer模型,我们在嵌入层上使用积分梯度方法。最后,我们将这些标记级贡献与人工注释进行了比较。尽管我们的基线模型产生了很好的性能,但我们发现支持模型决策的标记与人类注释有很大的不同。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Replay Attack Detection in Automatic Speaker Verification Based on ResNeWt18 with Linear Frequency Cepstral Coefficients Image Processing for Classification of Rice Varieties with Deep Convolutional Neural Networks KaleCare: Smart Farm for Kale with Pests Detection System using Machine Learning The comparison of the proposed recommended system with actual data sylbreak4all: Regular Expressions for Syllable Breaking of Nine Major Ethnic Languages of Myanmar
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1