Sarcasm detection in online comments using machine learning

IF 2.6 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE Information Discovery and Delivery Pub Date : 2023-07-31 DOI:10.1108/idd-01-2023-0002

Danny Sandor, Marina Bagić Babac

{"title":"Sarcasm detection in online comments using machine learning","authors":"Danny Sandor, Marina Bagić Babac","doi":"10.1108/idd-01-2023-0002","DOIUrl":null,"url":null,"abstract":"\nPurpose\nSarcasm is a linguistic expression that usually carries the opposite meaning of what is being said by words, thus making it difficult for machines to discover the actual meaning. It is mainly distinguished by the inflection with which it is spoken, with an undercurrent of irony, and is largely dependent on context, which makes it a difficult task for computational analysis. Moreover, sarcasm expresses negative sentiments using positive words, allowing it to easily confuse sentiment analysis models. This paper aims to demonstrate the task of sarcasm detection using the approach of machine and deep learning.\n\n\nDesign/methodology/approach\nFor the purpose of sarcasm detection, machine and deep learning models were used on a data set consisting of 1.3 million social media comments, including both sarcastic and non-sarcastic comments. The data set was pre-processed using natural language processing methods, and additional features were extracted and analysed. Several machine learning models, including logistic regression, ridge regression, linear support vector and support vector machines, along with two deep learning models based on bidirectional long short-term memory and one bidirectional encoder representations from transformers (BERT)-based model, were implemented, evaluated and compared.\n\n\nFindings\nThe performance of machine and deep learning models was compared in the task of sarcasm detection, and possible ways of improvement were discussed. Deep learning models showed more promise, performance-wise, for this type of task. Specifically, a state-of-the-art model in natural language processing, namely, BERT-based model, outperformed other machine and deep learning models.\n\n\nOriginality/value\nThis study compared the performance of the various machine and deep learning models in the task of sarcasm detection using the data set of 1.3 million comments from social media.\n","PeriodicalId":43488,"journal":{"name":"Information Discovery and Delivery","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2023-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Discovery and Delivery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/idd-01-2023-0002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose Sarcasm is a linguistic expression that usually carries the opposite meaning of what is being said by words, thus making it difficult for machines to discover the actual meaning. It is mainly distinguished by the inflection with which it is spoken, with an undercurrent of irony, and is largely dependent on context, which makes it a difficult task for computational analysis. Moreover, sarcasm expresses negative sentiments using positive words, allowing it to easily confuse sentiment analysis models. This paper aims to demonstrate the task of sarcasm detection using the approach of machine and deep learning. Design/methodology/approach For the purpose of sarcasm detection, machine and deep learning models were used on a data set consisting of 1.3 million social media comments, including both sarcastic and non-sarcastic comments. The data set was pre-processed using natural language processing methods, and additional features were extracted and analysed. Several machine learning models, including logistic regression, ridge regression, linear support vector and support vector machines, along with two deep learning models based on bidirectional long short-term memory and one bidirectional encoder representations from transformers (BERT)-based model, were implemented, evaluated and compared. Findings The performance of machine and deep learning models was compared in the task of sarcasm detection, and possible ways of improvement were discussed. Deep learning models showed more promise, performance-wise, for this type of task. Specifically, a state-of-the-art model in natural language processing, namely, BERT-based model, outperformed other machine and deep learning models. Originality/value This study compared the performance of the various machine and deep learning models in the task of sarcasm detection using the data set of 1.3 million comments from social media.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用机器学习的在线评论讽刺检测

讽刺是一种语言表达，通常与文字所表达的意思相反，因此使机器难以发现实际含义。它主要通过说话时的语调变化来区分，带有讽刺的暗流，并且很大程度上依赖于上下文，这使得计算分析成为一项困难的任务。此外，讽刺用积极的词语表达负面情绪，这很容易混淆情绪分析模型。本文旨在展示使用机器和深度学习方法进行讽刺检测的任务。设计/方法/方法为了讽刺检测的目的，机器和深度学习模型被用于由130万条社交媒体评论组成的数据集，包括讽刺和非讽刺评论。使用自然语言处理方法对数据集进行预处理，并提取和分析附加特征。几个机器学习模型，包括逻辑回归、脊回归、线性支持向量和支持向量机，以及两个基于双向长短期记忆的深度学习模型和一个基于变压器(BERT)的双向编码器表示模型，进行了实现、评估和比较。比较了机器模型和深度学习模型在讽刺检测任务中的性能，并讨论了可能的改进方法。对于这类任务，深度学习模型在性能方面表现出更大的希望。具体来说，自然语言处理中最先进的模型，即基于bert的模型，优于其他机器和深度学习模型。独创性/价值本研究使用来自社交媒体的130万条评论数据集，比较了各种机器和深度学习模型在讽刺检测任务中的表现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Information Discovery and Delivery INFORMATION SCIENCE & LIBRARY SCIENCE-

CiteScore

5.40

自引率

4.80%

发文量

期刊介绍： Information Discovery and Delivery covers information discovery and access for digital information researchers. This includes educators, knowledge professionals in education and cultural organisations, knowledge managers in media, health care and government, as well as librarians. The journal publishes research and practice which explores the digital information supply chain ie transport, flows, tracking, exchange and sharing, including within and between libraries. It is also interested in digital information capture, packaging and storage by ‘collectors’ of all kinds. Information is widely defined, including but not limited to: Records, Documents, Learning objects, Visual and sound files, Data and metadata and , User-generated content.