Research on sentiment classification of Blog based on PMI-IR

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) Pub Date : 2010-09-30 DOI:10.1109/NLPKE.2010.5587849

Xiuting Duan, Tingting He, Le Song

{"title":"Research on sentiment classification of Blog based on PMI-IR","authors":"Xiuting Duan, Tingting He, Le Song","doi":"10.1109/NLPKE.2010.5587849","DOIUrl":null,"url":null,"abstract":"Development of Blog texts information on the internet has brought new challenge to Chinese text classification. Aim to solving the semantics deficiency problem in traditional methods for Chinese text classification, this paper implements a text classification method on classifying a blog as joy, angry, sad or fear using a simple unsupervised learning algorithm. The classification of a blog text is predicted by the max semantic orientation (SO) of the phrases in the blog text that contains adjectives or adverbs. In this paper, the SO of a phrase is calculated as the mutual information between the given phrase and the polar words. Then the SO of the given blog text is determined by the max mutual information value. A blog text is classified as joy if the SO of its phrases is joy. Two different corpora are adopted to test our method, one is the Blog corpus collected by Monitor and Research Center for National Language Resource Network Multimedia Sub-branch Center, and the other is Chinese dataset provided by COAE2008 task. Based on the two datasets, the method respectively achieves a high improvement compared to the traditional methods.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NLPKE.2010.5587849","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Development of Blog texts information on the internet has brought new challenge to Chinese text classification. Aim to solving the semantics deficiency problem in traditional methods for Chinese text classification, this paper implements a text classification method on classifying a blog as joy, angry, sad or fear using a simple unsupervised learning algorithm. The classification of a blog text is predicted by the max semantic orientation (SO) of the phrases in the blog text that contains adjectives or adverbs. In this paper, the SO of a phrase is calculated as the mutual information between the given phrase and the polar words. Then the SO of the given blog text is determined by the max mutual information value. A blog text is classified as joy if the SO of its phrases is joy. Two different corpora are adopted to test our method, one is the Blog corpus collected by Monitor and Research Center for National Language Resource Network Multimedia Sub-branch Center, and the other is Chinese dataset provided by COAE2008 task. Based on the two datasets, the method respectively achieves a high improvement compared to the traditional methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于PMI-IR的博客情感分类研究

网络博客文本信息的发展给中文文本分类带来了新的挑战。针对传统中文文本分类方法语义不足的问题，本文采用一种简单的无监督学习算法实现了一种将博客分为喜、怒、悲、恐四类的文本分类方法。通过博客文本中包含形容词或副词的短语的最大语义方向(SO)来预测博客文本的分类。本文将短语的SO计算为给定短语与极性词之间的互信息。然后，给定博客文本的SO由最大互信息值确定。如果一篇博客文章中短语的SO是joy，那么它就被归类为joy。采用两个不同的语料库来测试我们的方法，一个是国家语言资源网络多媒体分中心监测与研究中心收集的博客语料库，另一个是COAE2008任务提供的中文数据集。基于这两个数据集，该方法分别比传统方法实现了较高的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)

自引率

0.00%

发文量

期刊最新文献

Dashboard: An integration and testing platform based on backboard architecture for NLP applications Chinese semantic role labeling based on semantic knowledge Transitivity in semantic relation learning Wisdom media “CAIWA Channel” based on natural language interface agent A new cascade algorithm based on CRFs for recognizing Chinese verb-object collocation