Nitish Ranjan Bhowmik, M. Arifuzzaman, M. Mondal, Md. Saiful Islam
{"title":"基于扩展词典的监督式机器学习孟加拉语文本情感分析","authors":"Nitish Ranjan Bhowmik, M. Arifuzzaman, M. Mondal, Md. Saiful Islam","doi":"10.2991/NLPR.D.210316.001","DOIUrl":null,"url":null,"abstract":"WiththeproliferationoftheInternet’ssocialdigitalcontent,sentimentanalysis(SA)hasgainedawideresearchinterestinnatural language processing (NLP). A few significant research has been done in Bangla language domain because of having intricate grammatical structure on text. This paper focuses on SA in the context of Bangla language. Firstly, a specific domain-based categorical weighted lexicon data dictionary (LDD) is developed for analyzing sentiments in Bangla. This LDD is developed by applying the concepts of normalization, tokenization, and stemming to two Bangla datasets available in GitHub repository. Secondly, a novel rule–based algorithm termed as Bangla Text Sentiment Score (BTSC) is developed for detecting sentence polarity. This algorithm considers parts of speech tagger words and special characters to generate a score of a word and thus that ofasentenceandablog.TheBTSCalgorithmalongwiththeLDDisappliedtoextractsentimentsbygeneratingscoresofthetwoBangladatasets.Thirdly,twofeaturematricesaredevelopedbyapplyingtermfrequency-inversedocumentfrequency(tf-idf)to thetwodatasets,andbyusingthecorrespondingBTSCscores.Next,supervisedmachinelearningclassifiersareappliedtothefeaturematrices","PeriodicalId":332352,"journal":{"name":"Natural Language Processing Research","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Bangla Text Sentiment Analysis Using Supervised Machine Learning with Extended Lexicon Dictionary\",\"authors\":\"Nitish Ranjan Bhowmik, M. Arifuzzaman, M. Mondal, Md. Saiful Islam\",\"doi\":\"10.2991/NLPR.D.210316.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"WiththeproliferationoftheInternet’ssocialdigitalcontent,sentimentanalysis(SA)hasgainedawideresearchinterestinnatural language processing (NLP). A few significant research has been done in Bangla language domain because of having intricate grammatical structure on text. This paper focuses on SA in the context of Bangla language. Firstly, a specific domain-based categorical weighted lexicon data dictionary (LDD) is developed for analyzing sentiments in Bangla. This LDD is developed by applying the concepts of normalization, tokenization, and stemming to two Bangla datasets available in GitHub repository. Secondly, a novel rule–based algorithm termed as Bangla Text Sentiment Score (BTSC) is developed for detecting sentence polarity. This algorithm considers parts of speech tagger words and special characters to generate a score of a word and thus that ofasentenceandablog.TheBTSCalgorithmalongwiththeLDDisappliedtoextractsentimentsbygeneratingscoresofthetwoBangladatasets.Thirdly,twofeaturematricesaredevelopedbyapplyingtermfrequency-inversedocumentfrequency(tf-idf)to thetwodatasets,andbyusingthecorrespondingBTSCscores.Next,supervisedmachinelearningclassifiersareappliedtothefeaturematrices\",\"PeriodicalId\":332352,\"journal\":{\"name\":\"Natural Language Processing Research\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Natural Language Processing Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2991/NLPR.D.210316.001\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2991/NLPR.D.210316.001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Bangla Text Sentiment Analysis Using Supervised Machine Learning with Extended Lexicon Dictionary
WiththeproliferationoftheInternet’ssocialdigitalcontent,sentimentanalysis(SA)hasgainedawideresearchinterestinnatural language processing (NLP). A few significant research has been done in Bangla language domain because of having intricate grammatical structure on text. This paper focuses on SA in the context of Bangla language. Firstly, a specific domain-based categorical weighted lexicon data dictionary (LDD) is developed for analyzing sentiments in Bangla. This LDD is developed by applying the concepts of normalization, tokenization, and stemming to two Bangla datasets available in GitHub repository. Secondly, a novel rule–based algorithm termed as Bangla Text Sentiment Score (BTSC) is developed for detecting sentence polarity. This algorithm considers parts of speech tagger words and special characters to generate a score of a word and thus that ofasentenceandablog.TheBTSCalgorithmalongwiththeLDDisappliedtoextractsentimentsbygeneratingscoresofthetwoBangladatasets.Thirdly,twofeaturematricesaredevelopedbyapplyingtermfrequency-inversedocumentfrequency(tf-idf)to thetwodatasets,andbyusingthecorrespondingBTSCscores.Next,supervisedmachinelearningclassifiersareappliedtothefeaturematrices