基于词典的电影评论推文情感分析

2019 1st International Conference on Artificial Intelligence and Data Sciences (AiDAS) Pub Date : 2019-09-01 DOI:10.1109/AiDAS47888.2019.8970722

A. Azizan, Nurul Najwa SK Abdul Jamal, M. N. Abdullah, Masurah Mohamad, N. Khairudin

{"title":"基于词典的电影评论推文情感分析","authors":"A. Azizan, Nurul Najwa SK Abdul Jamal, M. N. Abdullah, Masurah Mohamad, N. Khairudin","doi":"10.1109/AiDAS47888.2019.8970722","DOIUrl":null,"url":null,"abstract":"Sentiment analysis is a computational process to identify and classify subjective information such as positive, negative and neutral from the source material. It is able to extract feeling and emotion from a piece of a sentence. This technology has been widely used to extract valuable information from people’s views on social media. Hence, this project aims to classify movie reviews into positives, negatives and neutral polarity using lexicon-based method which used R as the language and development framework. Twitter data is used as the source material. Firstly, tweets were extracted using RStudio and Twitter API. Then data pre-processing was done by removing all the stop words and noises. Next was the tokenization process, which separates the words and matches the separated words with positive and negative words vocabulary. Finally, the result of the sentiment analysis is produced into positive, negative and neutral polarities. The results were evaluated using standard evaluation metrics that are the precision, recall, F1 score and accuracy. After all, it is found that the basic lexicon-based method is able to classify sentiment quite well with 52% accuracy. Apparently, the accuracy value achieved in our experiment is not impressive enough, but it is worth corresponding to the simplicity and minimal cost of development for sentiment analysis on Twitter data for movies.","PeriodicalId":227508,"journal":{"name":"2019 1st International Conference on Artificial Intelligence and Data Sciences (AiDAS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Lexicon-Based Sentiment Analysis for Movie Review Tweets\",\"authors\":\"A. Azizan, Nurul Najwa SK Abdul Jamal, M. N. Abdullah, Masurah Mohamad, N. Khairudin\",\"doi\":\"10.1109/AiDAS47888.2019.8970722\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sentiment analysis is a computational process to identify and classify subjective information such as positive, negative and neutral from the source material. It is able to extract feeling and emotion from a piece of a sentence. This technology has been widely used to extract valuable information from people’s views on social media. Hence, this project aims to classify movie reviews into positives, negatives and neutral polarity using lexicon-based method which used R as the language and development framework. Twitter data is used as the source material. Firstly, tweets were extracted using RStudio and Twitter API. Then data pre-processing was done by removing all the stop words and noises. Next was the tokenization process, which separates the words and matches the separated words with positive and negative words vocabulary. Finally, the result of the sentiment analysis is produced into positive, negative and neutral polarities. The results were evaluated using standard evaluation metrics that are the precision, recall, F1 score and accuracy. After all, it is found that the basic lexicon-based method is able to classify sentiment quite well with 52% accuracy. Apparently, the accuracy value achieved in our experiment is not impressive enough, but it is worth corresponding to the simplicity and minimal cost of development for sentiment analysis on Twitter data for movies.\",\"PeriodicalId\":227508,\"journal\":{\"name\":\"2019 1st International Conference on Artificial Intelligence and Data Sciences (AiDAS)\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 1st International Conference on Artificial Intelligence and Data Sciences (AiDAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AiDAS47888.2019.8970722\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 1st International Conference on Artificial Intelligence and Data Sciences (AiDAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AiDAS47888.2019.8970722","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

情感分析是一种从源材料中识别和分类主观信息(如积极、消极和中性)的计算过程。它能够从一个句子中提取感觉和情感。这项技术已被广泛用于从人们在社交媒体上的观点中提取有价值的信息。因此，本项目旨在使用基于词典的方法，使用R作为语言和开发框架，将电影评论分为正面、负面和中性极性。Twitter数据被用作源材料。首先，使用RStudio和Twitter API提取推文。然后对数据进行预处理，去除所有停止词和噪声。接下来是标记化过程，将单词分离出来，并将分离出来的单词与积极词汇和消极词汇进行匹配。最后，情绪分析的结果产生积极，消极和中性极性。使用标准评价指标对结果进行评价，即精密度、召回率、F1分数和准确度。毕竟，我们发现基于词典的基本方法能够很好地分类情感，准确率达到52%。显然，在我们的实验中获得的准确性值还不够令人印象深刻，但它值得对应于对电影Twitter数据进行情感分析的简单性和最小的开发成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Lexicon-Based Sentiment Analysis for Movie Review Tweets

Sentiment analysis is a computational process to identify and classify subjective information such as positive, negative and neutral from the source material. It is able to extract feeling and emotion from a piece of a sentence. This technology has been widely used to extract valuable information from people’s views on social media. Hence, this project aims to classify movie reviews into positives, negatives and neutral polarity using lexicon-based method which used R as the language and development framework. Twitter data is used as the source material. Firstly, tweets were extracted using RStudio and Twitter API. Then data pre-processing was done by removing all the stop words and noises. Next was the tokenization process, which separates the words and matches the separated words with positive and negative words vocabulary. Finally, the result of the sentiment analysis is produced into positive, negative and neutral polarities. The results were evaluated using standard evaluation metrics that are the precision, recall, F1 score and accuracy. After all, it is found that the basic lexicon-based method is able to classify sentiment quite well with 52% accuracy. Apparently, the accuracy value achieved in our experiment is not impressive enough, but it is worth corresponding to the simplicity and minimal cost of development for sentiment analysis on Twitter data for movies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 1st International Conference on Artificial Intelligence and Data Sciences (AiDAS)

自引率

0.00%

发文量