Identification of Misconceptions about Corona Outbreak Using Trigrams and Weighted TF-IDF Model

Sujatha Arun Kokatnoor, Balachandran Krishnan
{"title":"Identification of Misconceptions about Corona Outbreak Using Trigrams and Weighted TF-IDF Model","authors":"Sujatha Arun Kokatnoor, Balachandran Krishnan","doi":"10.5373/jardcs/v12sp5/20201788","DOIUrl":null,"url":null,"abstract":"Misconceptions of a particular issue like health, diseases, politics, government policies, epidemics and pandemics have been a social issue for a number of years, particularly after the advent of social media, and often spread faster than true truth The engagement with social media like Twitter being one of the most prominent news outlets continuing is a major source of information today, particularly the information distributed around the network In this paper, the efficacy of Misconception Detection System was tested on Corona Pandemic Dataset extracted from Twitter posts A Trigram and a weighted TF-IDF Model followed by a supervised classifier were used for categorizing the dataset into two classes: one with misconceptions about COVID-19 virus and the other comprising correct and authenticated information Trigrams were more reliable as the functional words related to coronavirus appeared more frequently in the corpus created The proposed system using a combination of trigrams and weighted TF-IDF gave relevant and a normalized score leading to an efficient creation of vector space model and this has yielded good performance results when compared with traditional approaches using Bag of Words and Count Vectorizer technique where the vector space model was created only through word count © 2020, Institute of Advanced Scientific Research, Inc All rights reserved","PeriodicalId":269116,"journal":{"name":"Journal of Advanced Research in Dynamical and Control Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Advanced Research in Dynamical and Control Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5373/jardcs/v12sp5/20201788","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Misconceptions of a particular issue like health, diseases, politics, government policies, epidemics and pandemics have been a social issue for a number of years, particularly after the advent of social media, and often spread faster than true truth The engagement with social media like Twitter being one of the most prominent news outlets continuing is a major source of information today, particularly the information distributed around the network In this paper, the efficacy of Misconception Detection System was tested on Corona Pandemic Dataset extracted from Twitter posts A Trigram and a weighted TF-IDF Model followed by a supervised classifier were used for categorizing the dataset into two classes: one with misconceptions about COVID-19 virus and the other comprising correct and authenticated information Trigrams were more reliable as the functional words related to coronavirus appeared more frequently in the corpus created The proposed system using a combination of trigrams and weighted TF-IDF gave relevant and a normalized score leading to an efficient creation of vector space model and this has yielded good performance results when compared with traditional approaches using Bag of Words and Count Vectorizer technique where the vector space model was created only through word count © 2020, Institute of Advanced Scientific Research, Inc All rights reserved
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用三元组和加权TF-IDF模型识别关于冠状病毒爆发的误解
对健康、疾病、政治、政府政策、流行病和流行病等特定问题的误解多年来一直是一个社会问题,特别是在社交媒体出现之后,而且往往比真实的真相传播得更快。与社交媒体(如Twitter)的接触是最突出的新闻媒体之一,继续是当今信息的主要来源,特别是在网络上分发的信息。在从Twitter帖子中提取的冠状病毒大流行数据集上测试了误解检测系统的有效性。使用Trigram和加权TF-IDF模型以及监督分类器将数据集分为两类:一个包含对COVID-19病毒的误解,另一个包含正确和经过认证的信息三元组更可靠,因为与冠状病毒相关的功能词在创建的语料库中出现的频率更高。该系统使用三元组和加权TF-IDF的组合给出了相关和标准化的分数,从而有效地创建向量空间模型,与使用words Bag的传统方法相比,这产生了良好的性能结果和Count Vectorizer技术,其中向量空间模型仅通过字数统计创建©2020,Institute of Advanced Scientific Research, Inc
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Attendance System Using RFID, IOT and Machine Learning: A Two-Factor Verification Approach Application of Technology Acceptance Model (TAM) in Consumer Behavioral Intention towards Online Shopping A Hybrid Gaussian Membership Function (GMF) and Fuzzy based Cost Drivers for Effective Software Cost Estimation: An Application Software Perspective Grey Hole Attack Detection and Prevention Methods in Wireless Sensor Networks A Communication between Sales Executives and Doctors
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1