Salma Tabashum, M. M. Hossain, Md. Ariful Islam, Mun Yea Mahafi Taz Zahara, Fahmida Naznin Fami
{"title":"最突出的机器学习和深度学习算法在孟加拉犯罪新闻文章分类中的性能分析","authors":"Salma Tabashum, M. M. Hossain, Md. Ariful Islam, Mun Yea Mahafi Taz Zahara, Fahmida Naznin Fami","doi":"10.1109/TENSYMP50017.2020.9230785","DOIUrl":null,"url":null,"abstract":"This work is dedicated to Bangla Crime Type Classification. As very few works had been done for Bangla crime classifier. To carry out this research, first we have developed a Bangla crime dataset which contains around 24,295 news articles and made most of them publicly available at github. Then we have built our crime classifier model and trained the classifier with our own dataset. We have analyzed word vectors like bag of words, TF-IDF in state-of-art machine learning algorithms as well as most promising semantic and syntactic word embeddings like Word2Vec, GloVe, fast-Text in both shallow and deep CNN and RNN to select best word embeddings for our classifier module. Finally we have summarized the experimental result in tabular form. We can see that significant improved accuracy can be achieved using deep learning algorithms over state-of-art machine learning algorithms in classifying Bangla crime data. The final experimental result shows that using shallow CNN with fastText,proposed model is able to achieve 93.70% accuracy.","PeriodicalId":6721,"journal":{"name":"2020 IEEE Region 10 Symposium (TENSYMP)","volume":"48 1","pages":"1273-1277"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Performance Analysis of Most Prominent Machine Learning and Deep Learning Algorithms In Classifying Bangla Crime News Articles\",\"authors\":\"Salma Tabashum, M. M. Hossain, Md. Ariful Islam, Mun Yea Mahafi Taz Zahara, Fahmida Naznin Fami\",\"doi\":\"10.1109/TENSYMP50017.2020.9230785\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This work is dedicated to Bangla Crime Type Classification. As very few works had been done for Bangla crime classifier. To carry out this research, first we have developed a Bangla crime dataset which contains around 24,295 news articles and made most of them publicly available at github. Then we have built our crime classifier model and trained the classifier with our own dataset. We have analyzed word vectors like bag of words, TF-IDF in state-of-art machine learning algorithms as well as most promising semantic and syntactic word embeddings like Word2Vec, GloVe, fast-Text in both shallow and deep CNN and RNN to select best word embeddings for our classifier module. Finally we have summarized the experimental result in tabular form. We can see that significant improved accuracy can be achieved using deep learning algorithms over state-of-art machine learning algorithms in classifying Bangla crime data. The final experimental result shows that using shallow CNN with fastText,proposed model is able to achieve 93.70% accuracy.\",\"PeriodicalId\":6721,\"journal\":{\"name\":\"2020 IEEE Region 10 Symposium (TENSYMP)\",\"volume\":\"48 1\",\"pages\":\"1273-1277\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE Region 10 Symposium (TENSYMP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TENSYMP50017.2020.9230785\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Region 10 Symposium (TENSYMP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENSYMP50017.2020.9230785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Performance Analysis of Most Prominent Machine Learning and Deep Learning Algorithms In Classifying Bangla Crime News Articles
This work is dedicated to Bangla Crime Type Classification. As very few works had been done for Bangla crime classifier. To carry out this research, first we have developed a Bangla crime dataset which contains around 24,295 news articles and made most of them publicly available at github. Then we have built our crime classifier model and trained the classifier with our own dataset. We have analyzed word vectors like bag of words, TF-IDF in state-of-art machine learning algorithms as well as most promising semantic and syntactic word embeddings like Word2Vec, GloVe, fast-Text in both shallow and deep CNN and RNN to select best word embeddings for our classifier module. Finally we have summarized the experimental result in tabular form. We can see that significant improved accuracy can be achieved using deep learning algorithms over state-of-art machine learning algorithms in classifying Bangla crime data. The final experimental result shows that using shallow CNN with fastText,proposed model is able to achieve 93.70% accuracy.