{"title":"Does Dialectal Variation Matter in Term-Based Feature Selection of Sentiment Analysis?: An Investigation into Multi-dialectal Chinese Microblogs","authors":"K. C. Chan, King-wa Fu, Chung-hong Chan","doi":"10.1145/2786451.2786924","DOIUrl":null,"url":null,"abstract":"This paper examines the feature selection procedures of sentiment analysis on a multi-dialectal language. We analyzed a dataset with over 6 million microblogs in China, a multi-dialectal country, deployed sentiment classifier to examine the positive/negative emotion carried by the microblogs, and explored the regional variations in the optimal feature vectors. The results support a localized feature vectors in some China's regions can maximize the classification accuracy and show that geographical distance between provinces and common dialect used contribute to explaining the provincial difference in the feature vectors. This research can be applied to other multicultural countries for feature vector optimization in sentiment analysis.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"60 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2786451.2786924","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper examines the feature selection procedures of sentiment analysis on a multi-dialectal language. We analyzed a dataset with over 6 million microblogs in China, a multi-dialectal country, deployed sentiment classifier to examine the positive/negative emotion carried by the microblogs, and explored the regional variations in the optimal feature vectors. The results support a localized feature vectors in some China's regions can maximize the classification accuracy and show that geographical distance between provinces and common dialect used contribute to explaining the provincial difference in the feature vectors. This research can be applied to other multicultural countries for feature vector optimization in sentiment analysis.