Anurita Bose, Deepanjali Pandit, Nidhi Prakash, Ashwini M. Joshi
{"title":"A Deviation based Ensemble Algorithm for Sarcasm Detection in Online Comments","authors":"Anurita Bose, Deepanjali Pandit, Nidhi Prakash, Ashwini M. Joshi","doi":"10.1109/ICECCT56650.2023.10179724","DOIUrl":null,"url":null,"abstract":"Sarcasm refers to the use of irony to mock or convey contempt and involves the use of words that mean the opposite of what someone truly intends to convey. Online forums which enable users to express sarcasm as a sentiment tend to induce misunderstandings between different parties and obscure the users' true intentions. This leads to ambiguity being one of the prime challenges in detecting sarcasm. Another challenge in sarcasm detection is the rapidly growing size of language vocabularies with the addition of new slang words every day. Additionally, usage of emojis in online text can greatly influence the polarity of a sentence by inducing a sarcastic tone. These setbacks make sarcasm a particularly demanding sentiment to determine. In this paper, the statistical significance of various deep learning models for the purpose of detecting sarcasm in online comments containing emojis is explored. For the task of binary classification, GRU achieves an accuracy score of 73.44% with an F1-score of 73.96%. The proposed ensemble-based approach yields an accuracy score of 74.41% for the combination of LSTM and GRU, which is comparable to the accuracy achieved with conventional ensemble techniques such as max-voting and averaging. Twenty-six different hybrid combinations of deep learning models were explored and the most optimal performing ones were identified. CNN and Global Average Pooling 1D are two other architectures that were explored.","PeriodicalId":180790,"journal":{"name":"2023 Fifth International Conference on Electrical, Computer and Communication Technologies (ICECCT)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 Fifth International Conference on Electrical, Computer and Communication Technologies (ICECCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECCT56650.2023.10179724","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Sarcasm refers to the use of irony to mock or convey contempt and involves the use of words that mean the opposite of what someone truly intends to convey. Online forums which enable users to express sarcasm as a sentiment tend to induce misunderstandings between different parties and obscure the users' true intentions. This leads to ambiguity being one of the prime challenges in detecting sarcasm. Another challenge in sarcasm detection is the rapidly growing size of language vocabularies with the addition of new slang words every day. Additionally, usage of emojis in online text can greatly influence the polarity of a sentence by inducing a sarcastic tone. These setbacks make sarcasm a particularly demanding sentiment to determine. In this paper, the statistical significance of various deep learning models for the purpose of detecting sarcasm in online comments containing emojis is explored. For the task of binary classification, GRU achieves an accuracy score of 73.44% with an F1-score of 73.96%. The proposed ensemble-based approach yields an accuracy score of 74.41% for the combination of LSTM and GRU, which is comparable to the accuracy achieved with conventional ensemble techniques such as max-voting and averaging. Twenty-six different hybrid combinations of deep learning models were explored and the most optimal performing ones were identified. CNN and Global Average Pooling 1D are two other architectures that were explored.
讽刺指的是用讽刺的方式来嘲笑或表达蔑视,包括使用与某人真正想表达的意思相反的词语。允许用户将讽刺作为一种情感表达的网络论坛,容易引起各方之间的误解,模糊用户的真实意图。这导致歧义成为检测讽刺的主要挑战之一。讽刺检测的另一个挑战是语言词汇量的快速增长,每天都有新的俚语词汇增加。此外,在网络文本中使用表情符号可以通过诱导讽刺语气来极大地影响句子的极性。这些挫折使讽刺成为一种特别需要判断的情绪。本文探讨了各种深度学习模型用于检测包含表情符号的在线评论中的讽刺的统计意义。对于二值分类任务,GRU的准确率得分为73.44%,f1得分为73.96%。基于集成的LSTM和GRU组合方法的准确率为74.41%,与传统集成技术(如max-voting和average)的准确率相当。探索了26种不同的深度学习模型混合组合,并确定了性能最优的模型。CNN和Global Average Pooling 1D是我们探索的另外两种架构。