Risul Islam Rasel, Anower Hossen Zihad, N. Sultana, M. M. Hoque
{"title":"Bangla Fake News Detection using Machine Learning, Deep Learning and Transformer Models","authors":"Risul Islam Rasel, Anower Hossen Zihad, N. Sultana, M. M. Hoque","doi":"10.1109/ICCIT57492.2022.10055592","DOIUrl":null,"url":null,"abstract":"News Categorization is one of the primary applications of Text Classification, especially, Fake news classification. In recent days, many researchers have done plenty of work on Fake news detection in rich resource languages like English. But, due to a lack of resources and language processing tools, research on low-resource languages like Bangla is still insignificant. In this study, we try to build a Bangla Fake news dataset combining newly collected fake news data and available secondary datasets. Previously available datasets contained redundant data, which we reduced in our experiment. Finally, we build a Fake news dataset that contains 4678 distinct news data. We experimented with our data with multiple Machine Learning (LR, SVM, KNN, MNB, Adaboost, and DT), Deep Neural Networks (LSTM, BiLSTM, CNN, LSTM-CNN, BiLSTM-CNN), and Transformer (Bangla-BERT, m-BERT) models to attain some state of the art results. The best performing models are CNN, CNN-LSTM, and BiLSTM, with the accuracy of 95.9%, 95.5%, and 95.3%, respectively. We also tested our models by applying the previously existing datasets, and we got a 1.4% to 3.4% improvement in accuracy from previous results. Besides accuracy improvement, our models show a significant increase in recall of fake news data compared to the prior studies.","PeriodicalId":255498,"journal":{"name":"2022 25th International Conference on Computer and Information Technology (ICCIT)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 25th International Conference on Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIT57492.2022.10055592","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
News Categorization is one of the primary applications of Text Classification, especially, Fake news classification. In recent days, many researchers have done plenty of work on Fake news detection in rich resource languages like English. But, due to a lack of resources and language processing tools, research on low-resource languages like Bangla is still insignificant. In this study, we try to build a Bangla Fake news dataset combining newly collected fake news data and available secondary datasets. Previously available datasets contained redundant data, which we reduced in our experiment. Finally, we build a Fake news dataset that contains 4678 distinct news data. We experimented with our data with multiple Machine Learning (LR, SVM, KNN, MNB, Adaboost, and DT), Deep Neural Networks (LSTM, BiLSTM, CNN, LSTM-CNN, BiLSTM-CNN), and Transformer (Bangla-BERT, m-BERT) models to attain some state of the art results. The best performing models are CNN, CNN-LSTM, and BiLSTM, with the accuracy of 95.9%, 95.5%, and 95.3%, respectively. We also tested our models by applying the previously existing datasets, and we got a 1.4% to 3.4% improvement in accuracy from previous results. Besides accuracy improvement, our models show a significant increase in recall of fake news data compared to the prior studies.