{"title":"基于词级数据增强的并行CNN孟加拉语新闻标签多类分类","authors":"Ruhul Amin, Nabila Sabrin Sworna, Nahid Hossain","doi":"10.1109/TENSYMP50017.2020.9230981","DOIUrl":null,"url":null,"abstract":"Text mining is the procedure of exploring large unorganized text data. Due to the availability of numerous amounts of text data through online blogs, newspapers and other media, text classification and categorization is the hot topic nowadays. Many researches have been done on this topic on English and other western languages. However, very few notable researches have been on Bangla language. Unavailability of a notable dataset in Bangla language is another burden to develop a highperformance text classification tool. In this paper, we have presented a Bangla news tags classification approach. The classification has been done entirely based on news titles only with parallel Convolutional Neural Network (CNN) which is a category of deep neural networks utilizing word-level data augmentation approach. Due to the unavailability of a proper and updated dataset on Bangla news titles and tags, we have developed our own dataset which consists of 88,968 news titles and tags by scrapping online newspapers. According to the classification result, our approach shows an accuracy of 93.47% which is the highest amongst the similar works.","PeriodicalId":6721,"journal":{"name":"2020 IEEE Region 10 Symposium (TENSYMP)","volume":"142 1","pages":"174-177"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Multiclass Classification for Bangla News Tags with Parallel CNN Using Word Level Data Augmentation\",\"authors\":\"Ruhul Amin, Nabila Sabrin Sworna, Nahid Hossain\",\"doi\":\"10.1109/TENSYMP50017.2020.9230981\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text mining is the procedure of exploring large unorganized text data. Due to the availability of numerous amounts of text data through online blogs, newspapers and other media, text classification and categorization is the hot topic nowadays. Many researches have been done on this topic on English and other western languages. However, very few notable researches have been on Bangla language. Unavailability of a notable dataset in Bangla language is another burden to develop a highperformance text classification tool. In this paper, we have presented a Bangla news tags classification approach. The classification has been done entirely based on news titles only with parallel Convolutional Neural Network (CNN) which is a category of deep neural networks utilizing word-level data augmentation approach. Due to the unavailability of a proper and updated dataset on Bangla news titles and tags, we have developed our own dataset which consists of 88,968 news titles and tags by scrapping online newspapers. According to the classification result, our approach shows an accuracy of 93.47% which is the highest amongst the similar works.\",\"PeriodicalId\":6721,\"journal\":{\"name\":\"2020 IEEE Region 10 Symposium (TENSYMP)\",\"volume\":\"142 1\",\"pages\":\"174-177\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE Region 10 Symposium (TENSYMP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TENSYMP50017.2020.9230981\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Region 10 Symposium (TENSYMP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENSYMP50017.2020.9230981","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multiclass Classification for Bangla News Tags with Parallel CNN Using Word Level Data Augmentation
Text mining is the procedure of exploring large unorganized text data. Due to the availability of numerous amounts of text data through online blogs, newspapers and other media, text classification and categorization is the hot topic nowadays. Many researches have been done on this topic on English and other western languages. However, very few notable researches have been on Bangla language. Unavailability of a notable dataset in Bangla language is another burden to develop a highperformance text classification tool. In this paper, we have presented a Bangla news tags classification approach. The classification has been done entirely based on news titles only with parallel Convolutional Neural Network (CNN) which is a category of deep neural networks utilizing word-level data augmentation approach. Due to the unavailability of a proper and updated dataset on Bangla news titles and tags, we have developed our own dataset which consists of 88,968 news titles and tags by scrapping online newspapers. According to the classification result, our approach shows an accuracy of 93.47% which is the highest amongst the similar works.