Analysis of different types of word representations and neural networks on sentiment classification tasks

2021 IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON) Pub Date : 2021-10-27 DOI:10.1109/iemcon53756.2021.9623193

Rajvardhan Patil, Nathaniel Bowman, Jeremy Wood

{"title":"Analysis of different types of word representations and neural networks on sentiment classification tasks","authors":"Rajvardhan Patil, Nathaniel Bowman, Jeremy Wood","doi":"10.1109/iemcon53756.2021.9623193","DOIUrl":null,"url":null,"abstract":"This paper evaluates and compares the performance of sentiment analysis using traditional vector representations to the word-embedding approach, and shallow networks to recurrent and gated neural networks. In the traditional approach, we explore ways the data can be presented in discrete space and how they perform on sentiment-analysis tasks. We compare their performances with the word-embeddings approach on the same sentiment analysis tasks where the words are represented in continuous-space. We use shallow machine-learning models, such as naïve bayes, nearest neighbor, stochastic gradient descent, decision tree, logistic regression, etc. in the traditional approach. For the word-embeddings approach, we apply - RNNs, LSTMs, and GRUs to perform the analysis. RNNs were used to overcome N-gram fixed window size limitation, and GRU and LSTM were used to overcome RNN's vanishing and exploding gradient problem and to capture long distance relationships. It was found that recurrent network models and word embeddings overall do better than the shallow networks and traditional word representations.","PeriodicalId":272590,"journal":{"name":"2021 IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON)","volume":"129 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iemcon53756.2021.9623193","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

This paper evaluates and compares the performance of sentiment analysis using traditional vector representations to the word-embedding approach, and shallow networks to recurrent and gated neural networks. In the traditional approach, we explore ways the data can be presented in discrete space and how they perform on sentiment-analysis tasks. We compare their performances with the word-embeddings approach on the same sentiment analysis tasks where the words are represented in continuous-space. We use shallow machine-learning models, such as naïve bayes, nearest neighbor, stochastic gradient descent, decision tree, logistic regression, etc. in the traditional approach. For the word-embeddings approach, we apply - RNNs, LSTMs, and GRUs to perform the analysis. RNNs were used to overcome N-gram fixed window size limitation, and GRU and LSTM were used to overcome RNN's vanishing and exploding gradient problem and to capture long distance relationships. It was found that recurrent network models and word embeddings overall do better than the shallow networks and traditional word representations.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

情感分类任务中不同类型词表示和神经网络的分析

本文评估并比较了使用传统向量表示与词嵌入方法的情感分析性能，以及浅层网络与循环和门控神经网络的性能。在传统方法中，我们探索数据在离散空间中呈现的方式，以及它们如何执行情感分析任务。在相同的情感分析任务中，我们将它们与词嵌入方法的表现进行了比较，其中词在连续空间中表示。我们在传统方法中使用浅层机器学习模型，如naïve贝叶斯、最近邻、随机梯度下降、决策树、逻辑回归等。对于词嵌入方法，我们应用rnn、lstm和gru来执行分析。RNN用于克服N-gram固定窗口大小的限制，GRU和LSTM用于克服RNN的梯度消失和爆炸问题，并捕获长距离关系。研究发现，循环网络模型和词嵌入总体上优于浅层网络和传统的词表示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON)

自引率

0.00%

发文量