基于Word2Vec特征聚类的情感分析改进

Eissa Alshari, A. Azman, S. Doraisamy, N. Mustapha, Mustafa Alkeshr
{"title":"基于Word2Vec特征聚类的情感分析改进","authors":"Eissa Alshari, A. Azman, S. Doraisamy, N. Mustapha, Mustafa Alkeshr","doi":"10.1109/DEXA.2017.41","DOIUrl":null,"url":null,"abstract":"Recently, many researchers have shown interest in using Word2Vec as the features for text classification tasks such as sentiment analysis. Its ability to model high quality distributional semantics among words has contributed to its success in many of the tasks. However, due to the high dimensional nature of the Word2Vec features, it increases the complexity for the classifier. In this paper, a method to construct a feature set based on Word2Vec is proposed for sentiment analysis. The method is based on clustering of terms in the vocabulary based on a set of opinion words from a sentiment lexical dictionary. As a result, the feature set for the classification is constructed based on the set of clusters. The effectiveness of the proposed method is evaluated on the Internet Movie Review Dataset with two classifiers, namely the Support Vector Machine and the Logistic Regression. The result is promising, showing that the proposed method can be more effective than the baseline approaches.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":"{\"title\":\"Improvement of Sentiment Analysis Based on Clustering of Word2Vec Features\",\"authors\":\"Eissa Alshari, A. Azman, S. Doraisamy, N. Mustapha, Mustafa Alkeshr\",\"doi\":\"10.1109/DEXA.2017.41\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, many researchers have shown interest in using Word2Vec as the features for text classification tasks such as sentiment analysis. Its ability to model high quality distributional semantics among words has contributed to its success in many of the tasks. However, due to the high dimensional nature of the Word2Vec features, it increases the complexity for the classifier. In this paper, a method to construct a feature set based on Word2Vec is proposed for sentiment analysis. The method is based on clustering of terms in the vocabulary based on a set of opinion words from a sentiment lexical dictionary. As a result, the feature set for the classification is constructed based on the set of clusters. The effectiveness of the proposed method is evaluated on the Internet Movie Review Dataset with two classifiers, namely the Support Vector Machine and the Logistic Regression. The result is promising, showing that the proposed method can be more effective than the baseline approaches.\",\"PeriodicalId\":127009,\"journal\":{\"name\":\"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"33\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DEXA.2017.41\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEXA.2017.41","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 33

摘要

最近,许多研究人员对使用Word2Vec作为文本分类任务(如情感分析)的特征表现出兴趣。它在单词之间建立高质量分布语义模型的能力有助于它在许多任务中取得成功。然而,由于Word2Vec特征的高维性质,它增加了分类器的复杂性。本文提出了一种基于Word2Vec的情感分析特征集构建方法。该方法基于一组来自情感词汇词典的意见词,对词汇中的术语进行聚类。因此,分类的特征集是基于聚类集构建的。用支持向量机和逻辑回归两种分类器在互联网电影评论数据集上评估了该方法的有效性。结果表明,该方法比基线方法更有效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Improvement of Sentiment Analysis Based on Clustering of Word2Vec Features
Recently, many researchers have shown interest in using Word2Vec as the features for text classification tasks such as sentiment analysis. Its ability to model high quality distributional semantics among words has contributed to its success in many of the tasks. However, due to the high dimensional nature of the Word2Vec features, it increases the complexity for the classifier. In this paper, a method to construct a feature set based on Word2Vec is proposed for sentiment analysis. The method is based on clustering of terms in the vocabulary based on a set of opinion words from a sentiment lexical dictionary. As a result, the feature set for the classification is constructed based on the set of clusters. The effectiveness of the proposed method is evaluated on the Internet Movie Review Dataset with two classifiers, namely the Support Vector Machine and the Logistic Regression. The result is promising, showing that the proposed method can be more effective than the baseline approaches.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
MuMs: Energy-Aware VM Selection Scheme for Cloud Data Center Biclustering of Biological Sequences Global and Local Feature Learning for Ego-Network Analysis Evaluation of Contextualization and Diversification Approaches in Aggregated Search Towards a Cloud of Clouds Elasticity Management System
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1