识别讽刺的语义方法，防止假新闻在社交网络上传播

International Journal of Information Technology Pub Date : 2024-08-27 DOI:10.1007/s41870-024-02156-7

Fethi Fkih, Delel Rhouma, Hajar Alghofaily

{"title":"识别讽刺的语义方法，防止假新闻在社交网络上传播","authors":"Fethi Fkih, Delel Rhouma, Hajar Alghofaily","doi":"10.1007/s41870-024-02156-7","DOIUrl":null,"url":null,"abstract":"<p>Misinterpreting satirical posts can contribute to the spread of misinformation and potentially be a source of what is commonly referred to as “fake news”. Satire is a form of humor that often involves exaggeration, irony, or ridicule to comment on or criticize a particular subject. While satirical content is not intended to be taken literally, there are instances where individuals may misinterpret it, leading to the dissemination of false information. In fact, we can reduce the spread of fake news by preventing people from misinterpreting satirical posts. However, sarcasm recognition is considered a challenging task in the Sentiment Analysis domain. Even for humans, it can be difficult to recognize irony and sarcasm, which conveys a sharp, bitter remark or criticism in ambiguous and unclear natural language. This makes the identification much more difficult for an automated model. In this paper, we have carried out an in-depth literature review about the main approaches used for sarcasm detection and especially those based on Machine Learning (ML) models. Then, a study was conducted with a series of binary classification models that exploit a variety of statistical and semantic features. Our experiments have been carried out on twitter dataset obtained from SemEval-2018 Task 3. An extensive evaluation of each set of classifiers demonstrates the efficiency of our proposed model in detecting and identifying sarcastic content in tweets. Finally, we compared the performance of machine learning models using our proposed features with our baseline and state-of-the-art on the same dataset. By using Support Vector Machine (SVM) model and the proposed features, we outperform the state-of-the-art and we obtained an accuracy of 79.46% with a F-score equal to 79.66% which considered a promising result in this field.</p>","PeriodicalId":14138,"journal":{"name":"International Journal of Information Technology","volume":"27 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A semantic approach for sarcasm identification for preventing fake news spreading on social networks\",\"authors\":\"Fethi Fkih, Delel Rhouma, Hajar Alghofaily\",\"doi\":\"10.1007/s41870-024-02156-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Misinterpreting satirical posts can contribute to the spread of misinformation and potentially be a source of what is commonly referred to as “fake news”. Satire is a form of humor that often involves exaggeration, irony, or ridicule to comment on or criticize a particular subject. While satirical content is not intended to be taken literally, there are instances where individuals may misinterpret it, leading to the dissemination of false information. In fact, we can reduce the spread of fake news by preventing people from misinterpreting satirical posts. However, sarcasm recognition is considered a challenging task in the Sentiment Analysis domain. Even for humans, it can be difficult to recognize irony and sarcasm, which conveys a sharp, bitter remark or criticism in ambiguous and unclear natural language. This makes the identification much more difficult for an automated model. In this paper, we have carried out an in-depth literature review about the main approaches used for sarcasm detection and especially those based on Machine Learning (ML) models. Then, a study was conducted with a series of binary classification models that exploit a variety of statistical and semantic features. Our experiments have been carried out on twitter dataset obtained from SemEval-2018 Task 3. An extensive evaluation of each set of classifiers demonstrates the efficiency of our proposed model in detecting and identifying sarcastic content in tweets. Finally, we compared the performance of machine learning models using our proposed features with our baseline and state-of-the-art on the same dataset. By using Support Vector Machine (SVM) model and the proposed features, we outperform the state-of-the-art and we obtained an accuracy of 79.46% with a F-score equal to 79.66% which considered a promising result in this field.</p>\",\"PeriodicalId\":14138,\"journal\":{\"name\":\"International Journal of Information Technology\",\"volume\":\"27 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s41870-024-02156-7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41870-024-02156-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

误读讽刺文章会助长错误信息的传播，并有可能成为通常所说的 "假新闻 "的来源。讽刺是一种幽默形式，通常通过夸张、讽刺或调侃来评论或批评某一特定主题。虽然讽刺内容并不是要从字面上理解，但在某些情况下，个人可能会对其进行误读，从而导致虚假信息的传播。事实上，我们可以通过防止人们误读讽刺文章来减少假新闻的传播。然而，在情感分析领域，讽刺识别被认为是一项具有挑战性的任务。讽刺和挖苦用含糊不清的自然语言表达了尖锐、尖刻的评论或批评，即使是人类也很难识别讽刺和挖苦。这就增加了自动模型识别的难度。在本文中，我们对用于讽刺检测的主要方法，尤其是基于机器学习（ML）模型的方法进行了深入的文献综述。然后，我们使用一系列利用各种统计和语义特征的二元分类模型进行了研究。我们的实验是在 SemEval-2018 任务 3 获得的 twitter 数据集上进行的。对每组分类器的广泛评估都证明了我们提出的模型在检测和识别推文中讽刺内容方面的效率。最后，我们比较了在相同数据集上使用我们提出的特征的机器学习模型与我们的基线模型和最先进模型的性能。通过使用支持向量机（SVM）模型和所提出的特征，我们的表现优于最先进的模型，准确率达到 79.46%，F-score 等于 79.66%，这在该领域是一个很有前途的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A semantic approach for sarcasm identification for preventing fake news spreading on social networks

Misinterpreting satirical posts can contribute to the spread of misinformation and potentially be a source of what is commonly referred to as “fake news”. Satire is a form of humor that often involves exaggeration, irony, or ridicule to comment on or criticize a particular subject. While satirical content is not intended to be taken literally, there are instances where individuals may misinterpret it, leading to the dissemination of false information. In fact, we can reduce the spread of fake news by preventing people from misinterpreting satirical posts. However, sarcasm recognition is considered a challenging task in the Sentiment Analysis domain. Even for humans, it can be difficult to recognize irony and sarcasm, which conveys a sharp, bitter remark or criticism in ambiguous and unclear natural language. This makes the identification much more difficult for an automated model. In this paper, we have carried out an in-depth literature review about the main approaches used for sarcasm detection and especially those based on Machine Learning (ML) models. Then, a study was conducted with a series of binary classification models that exploit a variety of statistical and semantic features. Our experiments have been carried out on twitter dataset obtained from SemEval-2018 Task 3. An extensive evaluation of each set of classifiers demonstrates the efficiency of our proposed model in detecting and identifying sarcastic content in tweets. Finally, we compared the performance of machine learning models using our proposed features with our baseline and state-of-the-art on the same dataset. By using Support Vector Machine (SVM) model and the proposed features, we outperform the state-of-the-art and we obtained an accuracy of 79.46% with a F-score equal to 79.66% which considered a promising result in this field.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Information Technology

自引率

0.00%

发文量