Linguistic Feature-based Classification for Anger and Anticipation using Machine Learning

News. Phi Delta Epsilon Pub Date : 2022-01-01 DOI:10.5220/0011289300003277

K. Ramakrishnan, Vimala Balakrishnan, Kumanan Govaichelvan

{"title":"Linguistic Feature-based Classification for Anger and Anticipation using Machine Learning","authors":"K. Ramakrishnan, Vimala Balakrishnan, Kumanan Govaichelvan","doi":"10.5220/0011289300003277","DOIUrl":null,"url":null,"abstract":"Growing number of online discourses enables the development of emotion mining models using natural language processing techniques. However, language diversity and cultural disparity alters the sentiment orientation of words depending on the community and context. Therefore, this study investigates the impacts of linguistic features, namely lexical and syntactic, in predicting the presence two emotions among Malaysian YouTube users, anger and anticipation. Term Frequency-Inverse Document Frequency (TF-IDF), Unigrams, Bigrams and Parts-of-Speech Tags were used as features to observe the classification performance. The dataset used in this study contains 2500 YouTube comments by Malaysian users on 46 Covid-19 related videos. Comments were extracted from three prominent Malaysian-centric English news channels: Channel News Asia (CNA), The Star News, and New Strait Times, ranging from 16 March 2020 - 30 April 2020 (i.e., first lockdown phase). Random Forest, Support Vector Machine, Logistic Regression, Decision Tree, K-Nearest Neighbour and Multinomial Naive Bayes were the six classification algorithms tested, with results indicating Support Vector Machine with TF-IDF provided the best performance, achieving accuracy of 76% and 73% for anger and anticipation, respectively.","PeriodicalId":88612,"journal":{"name":"News. Phi Delta Epsilon","volume":"46 1","pages":"140-147"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"News. Phi Delta Epsilon","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0011289300003277","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Growing number of online discourses enables the development of emotion mining models using natural language processing techniques. However, language diversity and cultural disparity alters the sentiment orientation of words depending on the community and context. Therefore, this study investigates the impacts of linguistic features, namely lexical and syntactic, in predicting the presence two emotions among Malaysian YouTube users, anger and anticipation. Term Frequency-Inverse Document Frequency (TF-IDF), Unigrams, Bigrams and Parts-of-Speech Tags were used as features to observe the classification performance. The dataset used in this study contains 2500 YouTube comments by Malaysian users on 46 Covid-19 related videos. Comments were extracted from three prominent Malaysian-centric English news channels: Channel News Asia (CNA), The Star News, and New Strait Times, ranging from 16 March 2020 - 30 April 2020 (i.e., first lockdown phase). Random Forest, Support Vector Machine, Logistic Regression, Decision Tree, K-Nearest Neighbour and Multinomial Naive Bayes were the six classification algorithms tested, with results indicating Support Vector Machine with TF-IDF provided the best performance, achieving accuracy of 76% and 73% for anger and anticipation, respectively.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于语言特征的愤怒和预期分类使用机器学习

越来越多的在线话语使得使用自然语言处理技术的情感挖掘模型得以发展。然而，语言的多样性和文化的差异会根据社区和语境的不同而改变词语的情感取向。因此，本研究考察了语言特征(即词汇和句法)在预测马来西亚YouTube用户愤怒和期待两种情绪存在方面的影响。使用词频-逆文档频率(TF-IDF)、单图、双图和词性标签作为特征来观察分类性能。本研究中使用的数据集包含马来西亚用户对46个Covid-19相关视频的2500条YouTube评论。评论摘自三个以马来西亚为中心的著名英语新闻频道:亚洲新闻频道(CNA)、《星报》和《新海峡时报》，时间为2020年3月16日至2020年4月30日(即第一封锁阶段)。随机森林、支持向量机、逻辑回归、决策树、k近邻和多项式朴素贝叶斯是测试的六种分类算法，结果表明支持向量机与TF-IDF提供了最好的性能，在愤怒和预期方面分别达到76%和73%的准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

News. Phi Delta Epsilon

自引率

0.00%

发文量

期刊最新文献

GAN-Based LiDAR Intensity Simulation Improving Primate Sounds Classification using Binary Presorting for Deep Learning Towards exploring adversarial learning for anomaly detection in complex driving scenes A Study of Neural Collapse for Text Classification Using Artificial Intelligence to Reduce the Risk of Transfusion Hemolytic Reactions