亚马逊评论的情感分析和文档表示方法比较

2018 IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY) Pub Date : 2018-09-01 DOI:10.1109/SISY.2018.8524814

Katic Tamara, Nemanja Milićević

{"title":"亚马逊评论的情感分析和文档表示方法比较","authors":"Katic Tamara, Nemanja Milićević","doi":"10.1109/SISY.2018.8524814","DOIUrl":null,"url":null,"abstract":"In the last few years sentiment analysis has made much progress. Sentiment analysis has been used in several applications to identify the opinions of people, products, brands, services, etc., which can, for example, improve a company's business. Some of these applications claim to have more effective document representation models than merely Information Retrieval approaches like the bag-of-words representation. Document representation models have increased interest to solve some of the limitations that bag-of-words representation has. In this paper, the several sentiment analysis and document representation methods of Amazon reviews are compared. In this paper, traditional models such as a bag-of-words, bag-of-ngrams and their TF-IDF variants combined with linear classifiers such as Logistic Regression and SVM, and deep learning models such as word-based convolutional neural networks (ConvNets) and the simple long short-term memory (LSTM) recurrent neural network were used. Various document representation techniques such as Paragraph Vector or using pre-trained Word2Vec and Glove word embeddings to compute the vector for each word in the document were tested, and word vectors are aggregated using the element-wise mean. It is shown that deep learning models perform better on our large dataset than traditional models. LSTM resulted with the best accuracy of 95.55%. Deep learning models generally work better than traditional models as training set size increases. Our best performing model can be used for automatic sentiment classification for future product reviews in retail stores.","PeriodicalId":6647,"journal":{"name":"2018 IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY)","volume":"33 1","pages":"000283-000286"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Comparing Sentiment Analysis and Document Representation Methods of Amazon Reviews\",\"authors\":\"Katic Tamara, Nemanja Milićević\",\"doi\":\"10.1109/SISY.2018.8524814\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the last few years sentiment analysis has made much progress. Sentiment analysis has been used in several applications to identify the opinions of people, products, brands, services, etc., which can, for example, improve a company's business. Some of these applications claim to have more effective document representation models than merely Information Retrieval approaches like the bag-of-words representation. Document representation models have increased interest to solve some of the limitations that bag-of-words representation has. In this paper, the several sentiment analysis and document representation methods of Amazon reviews are compared. In this paper, traditional models such as a bag-of-words, bag-of-ngrams and their TF-IDF variants combined with linear classifiers such as Logistic Regression and SVM, and deep learning models such as word-based convolutional neural networks (ConvNets) and the simple long short-term memory (LSTM) recurrent neural network were used. Various document representation techniques such as Paragraph Vector or using pre-trained Word2Vec and Glove word embeddings to compute the vector for each word in the document were tested, and word vectors are aggregated using the element-wise mean. It is shown that deep learning models perform better on our large dataset than traditional models. LSTM resulted with the best accuracy of 95.55%. Deep learning models generally work better than traditional models as training set size increases. Our best performing model can be used for automatic sentiment classification for future product reviews in retail stores.\",\"PeriodicalId\":6647,\"journal\":{\"name\":\"2018 IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY)\",\"volume\":\"33 1\",\"pages\":\"000283-000286\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SISY.2018.8524814\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SISY.2018.8524814","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

在过去的几年里，情绪分析取得了很大的进展。情感分析已经在几个应用程序中用于识别人、产品、品牌、服务等的意见，例如，可以改善公司的业务。其中一些应用程序声称具有比诸如词袋表示之类的信息检索方法更有效的文档表示模型。文档表示模型对解决词袋表示的一些限制越来越感兴趣。本文对亚马逊评论的几种情感分析和文档表示方法进行了比较。本文将传统的词袋、图袋及其TF-IDF变体模型与线性分类器(如Logistic回归和SVM)和深度学习模型(如基于词的卷积神经网络(ConvNets)和简单长短期记忆(LSTM)递归神经网络)相结合。测试了各种文档表示技术，如段落向量或使用预训练的Word2Vec和Glove词嵌入来计算文档中每个词的向量，并使用元素平均聚合词向量。研究表明，深度学习模型在我们的大数据集上比传统模型表现得更好。LSTM的准确率最高，为95.55%。随着训练集规模的增加，深度学习模型通常比传统模型工作得更好。我们表现最好的模型可以用于未来零售商店产品评论的自动情感分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Comparing Sentiment Analysis and Document Representation Methods of Amazon Reviews

In the last few years sentiment analysis has made much progress. Sentiment analysis has been used in several applications to identify the opinions of people, products, brands, services, etc., which can, for example, improve a company's business. Some of these applications claim to have more effective document representation models than merely Information Retrieval approaches like the bag-of-words representation. Document representation models have increased interest to solve some of the limitations that bag-of-words representation has. In this paper, the several sentiment analysis and document representation methods of Amazon reviews are compared. In this paper, traditional models such as a bag-of-words, bag-of-ngrams and their TF-IDF variants combined with linear classifiers such as Logistic Regression and SVM, and deep learning models such as word-based convolutional neural networks (ConvNets) and the simple long short-term memory (LSTM) recurrent neural network were used. Various document representation techniques such as Paragraph Vector or using pre-trained Word2Vec and Glove word embeddings to compute the vector for each word in the document were tested, and word vectors are aggregated using the element-wise mean. It is shown that deep learning models perform better on our large dataset than traditional models. LSTM resulted with the best accuracy of 95.55%. Deep learning models generally work better than traditional models as training set size increases. Our best performing model can be used for automatic sentiment classification for future product reviews in retail stores.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY)

自引率

0.00%

发文量