Identification of Misconceptions about Corona Outbreak Using Trigrams and Weighted TF-IDF Model

Journal of Advanced Research in Dynamical and Control Systems Pub Date : 2020-05-30 DOI:10.5373/jardcs/v12sp5/20201788

Sujatha Arun Kokatnoor, Balachandran Krishnan

{"title":"Identification of Misconceptions about Corona Outbreak Using Trigrams and Weighted TF-IDF Model","authors":"Sujatha Arun Kokatnoor, Balachandran Krishnan","doi":"10.5373/jardcs/v12sp5/20201788","DOIUrl":null,"url":null,"abstract":"Misconceptions of a particular issue like health, diseases, politics, government policies, epidemics and pandemics have been a social issue for a number of years, particularly after the advent of social media, and often spread faster than true truth The engagement with social media like Twitter being one of the most prominent news outlets continuing is a major source of information today, particularly the information distributed around the network In this paper, the efficacy of Misconception Detection System was tested on Corona Pandemic Dataset extracted from Twitter posts A Trigram and a weighted TF-IDF Model followed by a supervised classifier were used for categorizing the dataset into two classes: one with misconceptions about COVID-19 virus and the other comprising correct and authenticated information Trigrams were more reliable as the functional words related to coronavirus appeared more frequently in the corpus created The proposed system using a combination of trigrams and weighted TF-IDF gave relevant and a normalized score leading to an efficient creation of vector space model and this has yielded good performance results when compared with traditional approaches using Bag of Words and Count Vectorizer technique where the vector space model was created only through word count © 2020, Institute of Advanced Scientific Research, Inc All rights reserved","PeriodicalId":269116,"journal":{"name":"Journal of Advanced Research in Dynamical and Control Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Advanced Research in Dynamical and Control Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5373/jardcs/v12sp5/20201788","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Misconceptions of a particular issue like health, diseases, politics, government policies, epidemics and pandemics have been a social issue for a number of years, particularly after the advent of social media, and often spread faster than true truth The engagement with social media like Twitter being one of the most prominent news outlets continuing is a major source of information today, particularly the information distributed around the network In this paper, the efficacy of Misconception Detection System was tested on Corona Pandemic Dataset extracted from Twitter posts A Trigram and a weighted TF-IDF Model followed by a supervised classifier were used for categorizing the dataset into two classes: one with misconceptions about COVID-19 virus and the other comprising correct and authenticated information Trigrams were more reliable as the functional words related to coronavirus appeared more frequently in the corpus created The proposed system using a combination of trigrams and weighted TF-IDF gave relevant and a normalized score leading to an efficient creation of vector space model and this has yielded good performance results when compared with traditional approaches using Bag of Words and Count Vectorizer technique where the vector space model was created only through word count © 2020, Institute of Advanced Scientific Research, Inc All rights reserved

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用三元组和加权TF-IDF模型识别关于冠状病毒爆发的误解

对健康、疾病、政治、政府政策、流行病和流行病等特定问题的误解多年来一直是一个社会问题，特别是在社交媒体出现之后，而且往往比真实的真相传播得更快。与社交媒体(如Twitter)的接触是最突出的新闻媒体之一，继续是当今信息的主要来源，特别是在网络上分发的信息。在从Twitter帖子中提取的冠状病毒大流行数据集上测试了误解检测系统的有效性。使用Trigram和加权TF-IDF模型以及监督分类器将数据集分为两类:一个包含对COVID-19病毒的误解，另一个包含正确和经过认证的信息三元组更可靠，因为与冠状病毒相关的功能词在创建的语料库中出现的频率更高。该系统使用三元组和加权TF-IDF的组合给出了相关和标准化的分数，从而有效地创建向量空间模型，与使用words Bag的传统方法相比，这产生了良好的性能结果和Count Vectorizer技术，其中向量空间模型仅通过字数统计创建©2020,Institute of Advanced Scientific Research, Inc

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Advanced Research in Dynamical and Control Systems

自引率

0.00%

发文量