基于新闻标题数据集的讽刺检测

Rishabh Misra , Prahal Arora
{"title":"基于新闻标题数据集的讽刺检测","authors":"Rishabh Misra ,&nbsp;Prahal Arora","doi":"10.1016/j.aiopen.2023.01.001","DOIUrl":null,"url":null,"abstract":"<div><p>Sarcasm has been an elusive concept for humans. Due to interesting linguistic properties, sarcasm detection has gained traction of the Natural Language Processing (NLP) research community in the past few years. However, the task of predicting sarcasm in a text remains a difficult one for machines as well, and there are limited insights into what makes a sentence sarcastic. Past studies in sarcasm detection either use large scale datasets collected using tag-based supervision or small scale manually annotated datasets. The former category of datasets are noisy in terms of labels and language, whereas the latter category of datasets do not have enough instances to train deep learning models reliably despite having high-quality labels. To overcome these shortcomings, we introduce a high-quality and relatively larger-scale dataset which is a collection of news headlines from a sarcastic news website and a real news website. We describe the unique aspects of our dataset and compare its various characteristics with other benchmark datasets in sarcasm detection domain. Furthermore, we produce insights into what constitute as sarcasm in a text using a Hybrid Neural Network architecture. First released in 2019, we dedicate a section on how the NLP research community has extensively relied upon our contributions to push the state of the art further in the sarcasm detection domain. Lastly, we make the dataset as well as framework implementation publicly available to facilitate continued research in this domain.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 13-18"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Sarcasm detection using news headlines dataset\",\"authors\":\"Rishabh Misra ,&nbsp;Prahal Arora\",\"doi\":\"10.1016/j.aiopen.2023.01.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Sarcasm has been an elusive concept for humans. Due to interesting linguistic properties, sarcasm detection has gained traction of the Natural Language Processing (NLP) research community in the past few years. However, the task of predicting sarcasm in a text remains a difficult one for machines as well, and there are limited insights into what makes a sentence sarcastic. Past studies in sarcasm detection either use large scale datasets collected using tag-based supervision or small scale manually annotated datasets. The former category of datasets are noisy in terms of labels and language, whereas the latter category of datasets do not have enough instances to train deep learning models reliably despite having high-quality labels. To overcome these shortcomings, we introduce a high-quality and relatively larger-scale dataset which is a collection of news headlines from a sarcastic news website and a real news website. We describe the unique aspects of our dataset and compare its various characteristics with other benchmark datasets in sarcasm detection domain. Furthermore, we produce insights into what constitute as sarcasm in a text using a Hybrid Neural Network architecture. First released in 2019, we dedicate a section on how the NLP research community has extensively relied upon our contributions to push the state of the art further in the sarcasm detection domain. Lastly, we make the dataset as well as framework implementation publicly available to facilitate continued research in this domain.</p></div>\",\"PeriodicalId\":100068,\"journal\":{\"name\":\"AI Open\",\"volume\":\"4 \",\"pages\":\"Pages 13-18\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AI Open\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666651023000013\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI Open","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666651023000013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

讽刺对人类来说一直是一个难以捉摸的概念。由于有趣的语言特性,讽刺检测在过去几年中受到了自然语言处理(NLP)研究界的关注。然而,对于机器来说,预测文本中的讽刺仍然是一项困难的任务,而且对一个句子的讽刺原因的见解有限。过去的讽刺检测研究要么使用使用基于标签的监督收集的大规模数据集,要么使用小规模手动注释的数据集。前一类数据集在标签和语言方面是有噪声的,而后一类数据集中尽管有高质量的标签,但没有足够的实例来可靠地训练深度学习模型。为了克服这些缺点,我们引入了一个高质量且规模相对较大的数据集,该数据集是来自讽刺新闻网站和真实新闻网站的新闻标题的集合。我们描述了我们数据集的独特之处,并将其各种特征与讽刺检测领域的其他基准数据集进行了比较。此外,我们使用混合神经网络架构来深入了解文本中的讽刺构成。我们于2019年首次发布,专门介绍了NLP研究界如何广泛依赖我们的贡献,进一步推动讽刺检测领域的最新技术。最后,我们公开了数据集和框架实现,以促进该领域的持续研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Sarcasm detection using news headlines dataset

Sarcasm has been an elusive concept for humans. Due to interesting linguistic properties, sarcasm detection has gained traction of the Natural Language Processing (NLP) research community in the past few years. However, the task of predicting sarcasm in a text remains a difficult one for machines as well, and there are limited insights into what makes a sentence sarcastic. Past studies in sarcasm detection either use large scale datasets collected using tag-based supervision or small scale manually annotated datasets. The former category of datasets are noisy in terms of labels and language, whereas the latter category of datasets do not have enough instances to train deep learning models reliably despite having high-quality labels. To overcome these shortcomings, we introduce a high-quality and relatively larger-scale dataset which is a collection of news headlines from a sarcastic news website and a real news website. We describe the unique aspects of our dataset and compare its various characteristics with other benchmark datasets in sarcasm detection domain. Furthermore, we produce insights into what constitute as sarcasm in a text using a Hybrid Neural Network architecture. First released in 2019, we dedicate a section on how the NLP research community has extensively relied upon our contributions to push the state of the art further in the sarcasm detection domain. Lastly, we make the dataset as well as framework implementation publicly available to facilitate continued research in this domain.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
45.00
自引率
0.00%
发文量
0
期刊最新文献
GPT understands, too Adaptive negative representations for graph contrastive learning PM2.5 forecasting under distribution shift: A graph learning approach Enhancing neural network classification using fractional-order activation functions CPT: Colorful Prompt Tuning for pre-trained vision-language models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1