{"title":"How Does Chinese Segmentation Strategy Effect on Sentiment Analysis of Short Text?","authors":"Qing Lei, Haifeng Li, Yanxi Chen","doi":"10.1109/PRML52754.2021.9520738","DOIUrl":null,"url":null,"abstract":"In term of Chinese natural language processing, it exits one particular problem that how to choose the strategy of word segmentation, which commonly includes char-based and word-based. Targeted at sentiment analysis of short text comparing with long text, the word-based segmentation faces the other problem that there are the more ambiguous or unregistered words in context of short text. The feature extraction done by the different Chinese Word Segmentation impact the statistic distribution of features, and further the accuracy of sentiment analysis. This paper evaluates five Chinese segmentation strategy effect on Sentiment Analysis of Short Text. We chose two word-based Chinese Word Segmentation (CWS), and three char-based n-gram, then transformed Bag-of-Word (BOW) to Vector Space Model (VSM) which finally was fed into several classifiers to predict sentiment polarity of short text. To reduce the impact of corpora, the study is based a collection of five public corpora.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRML52754.2021.9520738","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In term of Chinese natural language processing, it exits one particular problem that how to choose the strategy of word segmentation, which commonly includes char-based and word-based. Targeted at sentiment analysis of short text comparing with long text, the word-based segmentation faces the other problem that there are the more ambiguous or unregistered words in context of short text. The feature extraction done by the different Chinese Word Segmentation impact the statistic distribution of features, and further the accuracy of sentiment analysis. This paper evaluates five Chinese segmentation strategy effect on Sentiment Analysis of Short Text. We chose two word-based Chinese Word Segmentation (CWS), and three char-based n-gram, then transformed Bag-of-Word (BOW) to Vector Space Model (VSM) which finally was fed into several classifiers to predict sentiment polarity of short text. To reduce the impact of corpora, the study is based a collection of five public corpora.