印度新闻数据集的假新闻检测

Global Transitions Proceedings Pub Date : 2022-06-01 Epub Date: 2022-04-02 DOI:10.1016/j.gltp.2022.03.014

Sudhanshu Kumar, Thoudam Doren Singh

{"title":"印度新闻数据集的假新闻检测","authors":"Sudhanshu Kumar, Thoudam Doren Singh","doi":"10.1016/j.gltp.2022.03.014","DOIUrl":null,"url":null,"abstract":"<div><p>With the increase in social networks, more number of people are creating and sharing information than ever before, many of them have no relevance to reality. Due to this, fake news for various political and commercial purposes are spreading quickly. Online newspaper has made it challenging to identify trustworthy news sources. In this work, Hindi news articles from various news sources are collected. Preprocessing, feature extraction, classification and prediction processes are discussed in detail. Different machine learning algorithms such as Naïve Bayes, logistic regression and Long Short-Term Memory (LSTM) are used to detect the fake news. The preprocessing step includes data cleaning, stop words removal, tokenizing and stemming. Term frequency inverse document frequency(TF-IDF) is used for feature extraction. Naïve Bayes, logistic regression and LSTM classifiers are used and compared for fake news detection with probability of truth. It is observed that among these three classifiers, LSTM achieved best accuracy of 92.36%.</p></div>","PeriodicalId":100588,"journal":{"name":"Global Transitions Proceedings","volume":"3 1","pages":"Pages 289-297"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666285X2200019X/pdfft?md5=158942440d14be3e63b882da17dba987&pid=1-s2.0-S2666285X2200019X-main.pdf","citationCount":"16","resultStr":"{\"title\":\"Fake news detection on Hindi news dataset\",\"authors\":\"Sudhanshu Kumar, Thoudam Doren Singh\",\"doi\":\"10.1016/j.gltp.2022.03.014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>With the increase in social networks, more number of people are creating and sharing information than ever before, many of them have no relevance to reality. Due to this, fake news for various political and commercial purposes are spreading quickly. Online newspaper has made it challenging to identify trustworthy news sources. In this work, Hindi news articles from various news sources are collected. Preprocessing, feature extraction, classification and prediction processes are discussed in detail. Different machine learning algorithms such as Naïve Bayes, logistic regression and Long Short-Term Memory (LSTM) are used to detect the fake news. The preprocessing step includes data cleaning, stop words removal, tokenizing and stemming. Term frequency inverse document frequency(TF-IDF) is used for feature extraction. Naïve Bayes, logistic regression and LSTM classifiers are used and compared for fake news detection with probability of truth. It is observed that among these three classifiers, LSTM achieved best accuracy of 92.36%.</p></div>\",\"PeriodicalId\":100588,\"journal\":{\"name\":\"Global Transitions Proceedings\",\"volume\":\"3 1\",\"pages\":\"Pages 289-297\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666285X2200019X/pdfft?md5=158942440d14be3e63b882da17dba987&pid=1-s2.0-S2666285X2200019X-main.pdf\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Global Transitions Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666285X2200019X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2022/4/2 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Global Transitions Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666285X2200019X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/4/2 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

摘要

随着社交网络的增加，越来越多的人创造和分享信息，其中许多与现实无关。因此，出于各种政治和商业目的的假新闻正在迅速传播。在线报纸使人们很难找到值得信赖的新闻来源。在这项工作中，收集了来自各种新闻来源的印地语新闻文章。详细讨论了预处理、特征提取、分类和预测过程。利用Naïve贝叶斯、逻辑回归、长短期记忆(LSTM)等不同的机器学习算法来检测假新闻。预处理步骤包括数据清理、停止词删除、标记化和词干提取。使用词频逆文档频率(TF-IDF)进行特征提取。Naïve使用贝叶斯，逻辑回归和LSTM分类器对假新闻的真实概率检测进行了比较。观察到，在这三种分类器中，LSTM的准确率最高，为92.36%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Fake news detection on Hindi news dataset

With the increase in social networks, more number of people are creating and sharing information than ever before, many of them have no relevance to reality. Due to this, fake news for various political and commercial purposes are spreading quickly. Online newspaper has made it challenging to identify trustworthy news sources. In this work, Hindi news articles from various news sources are collected. Preprocessing, feature extraction, classification and prediction processes are discussed in detail. Different machine learning algorithms such as Naïve Bayes, logistic regression and Long Short-Term Memory (LSTM) are used to detect the fake news. The preprocessing step includes data cleaning, stop words removal, tokenizing and stemming. Term frequency inverse document frequency(TF-IDF) is used for feature extraction. Naïve Bayes, logistic regression and LSTM classifiers are used and compared for fake news detection with probability of truth. It is observed that among these three classifiers, LSTM achieved best accuracy of 92.36%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Global Transitions Proceedings

自引率

0.00%

发文量