A Novel Auto-Annotation Technique for Aspect Level Sentiment Analysis

IF 1.7 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Cmc-computers Materials & Continua Pub Date : 2022-01-01 DOI:10.32604/cmc.2022.020544

M. Aasim Qureshi, M. Asif, M. Fadzil Hassan, Ghulam Mustafa, Muhammad Khurram Ehsan, Aasim Ali, Unaza Sajid

{"title":"A Novel Auto-Annotation Technique for Aspect Level Sentiment Analysis","authors":"M. Aasim Qureshi, M. Asif, M. Fadzil Hassan, Ghulam Mustafa, Muhammad Khurram Ehsan, Aasim Ali, Unaza Sajid","doi":"10.32604/cmc.2022.020544","DOIUrl":null,"url":null,"abstract":": In machine learning, sentiment analysis is a technique to find and analyze the sentiments hidden in the text. For sentiment analysis, annotated data is a basic requirement. Generally, this data is manually annotated. Manual annotation is time consuming, costly and laborious process. To overcome these resource constraints this research has proposed a fully automated annotation technique for aspect level sentiment analysis. Dataset is created from the reviews of ten most popular songs on YouTube. Reviews of five aspects—voice, video, music, lyrics and song, are extracted. An N-Gram based technique is proposed. Complete dataset consists of 369436 reviews that took 173.53 s to annotate using the proposed technique while this dataset might have taken approximately 2.07 million seconds (575 h) if it was annotated manually. For the validation of the proposed technique, a sub-dataset—Voice, is annotated manually as well as with the proposed technique. Cohen’s Kappa statistics is used to evaluate the degree of agreement between the two annotations. The high Kappa value (i.e., 0.9571%) shows the high level of agreement between the two. This validates that the quality of annotation of the proposed technique is as good as manual annotation even with far less computational cost. This research also contributes in consolidating the guidelines for the manual annotation process.","PeriodicalId":10440,"journal":{"name":"Cmc-computers Materials & Continua","volume":"14 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cmc-computers Materials & Continua","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.32604/cmc.2022.020544","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 8

Abstract

: In machine learning, sentiment analysis is a technique to find and analyze the sentiments hidden in the text. For sentiment analysis, annotated data is a basic requirement. Generally, this data is manually annotated. Manual annotation is time consuming, costly and laborious process. To overcome these resource constraints this research has proposed a fully automated annotation technique for aspect level sentiment analysis. Dataset is created from the reviews of ten most popular songs on YouTube. Reviews of five aspects—voice, video, music, lyrics and song, are extracted. An N-Gram based technique is proposed. Complete dataset consists of 369436 reviews that took 173.53 s to annotate using the proposed technique while this dataset might have taken approximately 2.07 million seconds (575 h) if it was annotated manually. For the validation of the proposed technique, a sub-dataset—Voice, is annotated manually as well as with the proposed technique. Cohen’s Kappa statistics is used to evaluate the degree of agreement between the two annotations. The high Kappa value (i.e., 0.9571%) shows the high level of agreement between the two. This validates that the quality of annotation of the proposed technique is as good as manual annotation even with far less computational cost. This research also contributes in consolidating the guidelines for the manual annotation process.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

面向方面级情感分析的自动标注技术

在机器学习中，情感分析是一种发现和分析隐藏在文本中的情感的技术。对于情感分析，带注释的数据是一个基本要求。通常，这些数据是手工标注的。手动标注是一个耗时、昂贵且费力的过程。为了克服这些资源限制，本研究提出了一种面向方面级情感分析的全自动标注技术。数据集是根据YouTube上10首最流行歌曲的评论创建的。提取了语音、视频、音乐、歌词、歌曲五个方面的评论。提出了一种基于n图的技术。完整的数据集由369436条评论组成，使用建议的技术进行注释花费了173.53秒，而如果手动注释该数据集可能需要大约207万秒(575小时)。为了验证所提出的技术，在使用所提出的技术的同时，还对子数据集语音进行了手动注释。Cohen的Kappa统计用于评估两个注释之间的一致程度。Kappa值较高(0.9571%)，表明两者吻合程度较高。这证明了所提出的技术的注释质量与人工注释一样好，即使计算成本要低得多。这项研究也有助于巩固手工注释过程的指导方针。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Cmc-computers Materials & Continua 工程技术-材料科学：综合

CiteScore

5.30

自引率

19.40%

发文量

345

审稿时长

1 months

期刊介绍： This journal publishes original research papers in the areas of computer networks, artificial intelligence, big data management, software engineering, multimedia, cyber security, internet of things, materials genome, integrated materials science, data analysis, modeling, and engineering of designing and manufacturing of modern functional and multifunctional materials. Novel high performance computing methods, big data analysis, and artificial intelligence that advance material technologies are especially welcome.