A Novel Auto-Annotation Technique for Aspect Level Sentiment Analysis

IF 2 4区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Cmc-computers Materials & Continua Pub Date : 2022-01-01 DOI:10.32604/cmc.2022.020544
M. Aasim Qureshi, M. Asif, M. Fadzil Hassan, Ghulam Mustafa, Muhammad Khurram Ehsan, Aasim Ali, Unaza Sajid
{"title":"A Novel Auto-Annotation Technique for Aspect Level Sentiment Analysis","authors":"M. Aasim Qureshi, M. Asif, M. Fadzil Hassan, Ghulam Mustafa, Muhammad Khurram Ehsan, Aasim Ali, Unaza Sajid","doi":"10.32604/cmc.2022.020544","DOIUrl":null,"url":null,"abstract":": In machine learning, sentiment analysis is a technique to find and analyze the sentiments hidden in the text. For sentiment analysis, annotated data is a basic requirement. Generally, this data is manually annotated. Manual annotation is time consuming, costly and laborious process. To overcome these resource constraints this research has proposed a fully automated annotation technique for aspect level sentiment analysis. Dataset is created from the reviews of ten most popular songs on YouTube. Reviews of five aspects—voice, video, music, lyrics and song, are extracted. An N-Gram based technique is proposed. Complete dataset consists of 369436 reviews that took 173.53 s to annotate using the proposed technique while this dataset might have taken approximately 2.07 million seconds (575 h) if it was annotated manually. For the validation of the proposed technique, a sub-dataset—Voice, is annotated manually as well as with the proposed technique. Cohen’s Kappa statistics is used to evaluate the degree of agreement between the two annotations. The high Kappa value (i.e., 0.9571%) shows the high level of agreement between the two. This validates that the quality of annotation of the proposed technique is as good as manual annotation even with far less computational cost. This research also contributes in consolidating the guidelines for the manual annotation process.","PeriodicalId":10440,"journal":{"name":"Cmc-computers Materials & Continua","volume":"14 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cmc-computers Materials & Continua","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.32604/cmc.2022.020544","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 8

Abstract

: In machine learning, sentiment analysis is a technique to find and analyze the sentiments hidden in the text. For sentiment analysis, annotated data is a basic requirement. Generally, this data is manually annotated. Manual annotation is time consuming, costly and laborious process. To overcome these resource constraints this research has proposed a fully automated annotation technique for aspect level sentiment analysis. Dataset is created from the reviews of ten most popular songs on YouTube. Reviews of five aspects—voice, video, music, lyrics and song, are extracted. An N-Gram based technique is proposed. Complete dataset consists of 369436 reviews that took 173.53 s to annotate using the proposed technique while this dataset might have taken approximately 2.07 million seconds (575 h) if it was annotated manually. For the validation of the proposed technique, a sub-dataset—Voice, is annotated manually as well as with the proposed technique. Cohen’s Kappa statistics is used to evaluate the degree of agreement between the two annotations. The high Kappa value (i.e., 0.9571%) shows the high level of agreement between the two. This validates that the quality of annotation of the proposed technique is as good as manual annotation even with far less computational cost. This research also contributes in consolidating the guidelines for the manual annotation process.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
面向方面级情感分析的自动标注技术
在机器学习中,情感分析是一种发现和分析隐藏在文本中的情感的技术。对于情感分析,带注释的数据是一个基本要求。通常,这些数据是手工标注的。手动标注是一个耗时、昂贵且费力的过程。为了克服这些资源限制,本研究提出了一种面向方面级情感分析的全自动标注技术。数据集是根据YouTube上10首最流行歌曲的评论创建的。提取了语音、视频、音乐、歌词、歌曲五个方面的评论。提出了一种基于n图的技术。完整的数据集由369436条评论组成,使用建议的技术进行注释花费了173.53秒,而如果手动注释该数据集可能需要大约207万秒(575小时)。为了验证所提出的技术,在使用所提出的技术的同时,还对子数据集语音进行了手动注释。Cohen的Kappa统计用于评估两个注释之间的一致程度。Kappa值较高(0.9571%),表明两者吻合程度较高。这证明了所提出的技术的注释质量与人工注释一样好,即使计算成本要低得多。这项研究也有助于巩固手工注释过程的指导方针。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Cmc-computers Materials & Continua
Cmc-computers Materials & Continua 工程技术-材料科学:综合
CiteScore
5.30
自引率
19.40%
发文量
345
审稿时长
1 months
期刊介绍: This journal publishes original research papers in the areas of computer networks, artificial intelligence, big data management, software engineering, multimedia, cyber security, internet of things, materials genome, integrated materials science, data analysis, modeling, and engineering of designing and manufacturing of modern functional and multifunctional materials. Novel high performance computing methods, big data analysis, and artificial intelligence that advance material technologies are especially welcome.
期刊最新文献
Estimating Fuel-Efficient Air Plane Trajectories Using Machine Learning 2D Finite Element Analysis of Asynchronous Machine Influenced Under Power Quality Perturbations Multi-Attribute Selection Procedures Based on Regret and Rejoice for the Decision-Maker Disease Diagnosis System Using IoT Empowered with Fuzzy Inference System Automated Grading of Breast Cancer Histopathology Images Using Multilayered Autoencoder
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1