{"title":"印度新闻数据集的假新闻检测","authors":"Sudhanshu Kumar, Thoudam Doren Singh","doi":"10.1016/j.gltp.2022.03.014","DOIUrl":null,"url":null,"abstract":"<div><p>With the increase in social networks, more number of people are creating and sharing information than ever before, many of them have no relevance to reality. Due to this, fake news for various political and commercial purposes are spreading quickly. Online newspaper has made it challenging to identify trustworthy news sources. In this work, Hindi news articles from various news sources are collected. Preprocessing, feature extraction, classification and prediction processes are discussed in detail. Different machine learning algorithms such as Naïve Bayes, logistic regression and Long Short-Term Memory (LSTM) are used to detect the fake news. The preprocessing step includes data cleaning, stop words removal, tokenizing and stemming. Term frequency inverse document frequency(TF-IDF) is used for feature extraction. Naïve Bayes, logistic regression and LSTM classifiers are used and compared for fake news detection with probability of truth. It is observed that among these three classifiers, LSTM achieved best accuracy of 92.36%.</p></div>","PeriodicalId":100588,"journal":{"name":"Global Transitions Proceedings","volume":"3 1","pages":"Pages 289-297"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666285X2200019X/pdfft?md5=158942440d14be3e63b882da17dba987&pid=1-s2.0-S2666285X2200019X-main.pdf","citationCount":"16","resultStr":"{\"title\":\"Fake news detection on Hindi news dataset\",\"authors\":\"Sudhanshu Kumar, Thoudam Doren Singh\",\"doi\":\"10.1016/j.gltp.2022.03.014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>With the increase in social networks, more number of people are creating and sharing information than ever before, many of them have no relevance to reality. Due to this, fake news for various political and commercial purposes are spreading quickly. Online newspaper has made it challenging to identify trustworthy news sources. In this work, Hindi news articles from various news sources are collected. Preprocessing, feature extraction, classification and prediction processes are discussed in detail. Different machine learning algorithms such as Naïve Bayes, logistic regression and Long Short-Term Memory (LSTM) are used to detect the fake news. The preprocessing step includes data cleaning, stop words removal, tokenizing and stemming. Term frequency inverse document frequency(TF-IDF) is used for feature extraction. Naïve Bayes, logistic regression and LSTM classifiers are used and compared for fake news detection with probability of truth. It is observed that among these three classifiers, LSTM achieved best accuracy of 92.36%.</p></div>\",\"PeriodicalId\":100588,\"journal\":{\"name\":\"Global Transitions Proceedings\",\"volume\":\"3 1\",\"pages\":\"Pages 289-297\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666285X2200019X/pdfft?md5=158942440d14be3e63b882da17dba987&pid=1-s2.0-S2666285X2200019X-main.pdf\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Global Transitions Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666285X2200019X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Global Transitions Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666285X2200019X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
With the increase in social networks, more number of people are creating and sharing information than ever before, many of them have no relevance to reality. Due to this, fake news for various political and commercial purposes are spreading quickly. Online newspaper has made it challenging to identify trustworthy news sources. In this work, Hindi news articles from various news sources are collected. Preprocessing, feature extraction, classification and prediction processes are discussed in detail. Different machine learning algorithms such as Naïve Bayes, logistic regression and Long Short-Term Memory (LSTM) are used to detect the fake news. The preprocessing step includes data cleaning, stop words removal, tokenizing and stemming. Term frequency inverse document frequency(TF-IDF) is used for feature extraction. Naïve Bayes, logistic regression and LSTM classifiers are used and compared for fake news detection with probability of truth. It is observed that among these three classifiers, LSTM achieved best accuracy of 92.36%.