Instagram 上的 Mpox 叙事：用于情感、仇恨言论和焦虑分析的 Mpox Instagram 帖子标签化多语言数据集

arXiv - CS - Social and Information Networks Pub Date : 2024-09-09 DOI:arxiv-2409.05292

Nirmalya Thakur

{"title":"Instagram 上的 Mpox 叙事：用于情感、仇恨言论和焦虑分析的 Mpox Instagram 帖子标签化多语言数据集","authors":"Nirmalya Thakur","doi":"arxiv-2409.05292","DOIUrl":null,"url":null,"abstract":"The world is currently experiencing an outbreak of mpox, which has been\ndeclared a Public Health Emergency of International Concern by WHO. No prior\nwork related to social media mining has focused on the development of a dataset\nof Instagram posts about the mpox outbreak. The work presented in this paper\naims to address this research gap and makes two scientific contributions to\nthis field. First, it presents a multilingual dataset of 60,127 Instagram posts\nabout mpox, published between July 23, 2022, and September 5, 2024. The\ndataset, available at https://dx.doi.org/10.21227/7fvc-y093, contains Instagram\nposts about mpox in 52 languages. For each of these posts, the Post ID, Post\nDescription, Date of publication, language, and translated version of the post\n(translation to English was performed using the Google Translate API) are\npresented as separate attributes in the dataset. After developing this dataset,\nsentiment analysis, hate speech detection, and anxiety or stress detection were\nperformed. This process included classifying each post into (i) one of the\nsentiment classes, i.e., fear, surprise, joy, sadness, anger, disgust, or\nneutral, (ii) hate or not hate, and (iii) anxiety/stress detected or no\nanxiety/stress detected. These results are presented as separate attributes in\nthe dataset. Second, this paper presents the results of performing sentiment\nanalysis, hate speech analysis, and anxiety or stress analysis. The variation\nof the sentiment classes - fear, surprise, joy, sadness, anger, disgust, and\nneutral were observed to be 27.95%, 2.57%, 8.69%, 5.94%, 2.69%, 1.53%, and\n50.64%, respectively. In terms of hate speech detection, 95.75% of the posts\ndid not contain hate and the remaining 4.25% of the posts contained hate.\nFinally, 72.05% of the posts did not indicate any anxiety/stress, and the\nremaining 27.95% of the posts represented some form of anxiety/stress.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"5 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Mpox Narrative on Instagram: A Labeled Multilingual Dataset of Instagram Posts on Mpox for Sentiment, Hate Speech, and Anxiety Analysis\",\"authors\":\"Nirmalya Thakur\",\"doi\":\"arxiv-2409.05292\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The world is currently experiencing an outbreak of mpox, which has been\\ndeclared a Public Health Emergency of International Concern by WHO. No prior\\nwork related to social media mining has focused on the development of a dataset\\nof Instagram posts about the mpox outbreak. The work presented in this paper\\naims to address this research gap and makes two scientific contributions to\\nthis field. First, it presents a multilingual dataset of 60,127 Instagram posts\\nabout mpox, published between July 23, 2022, and September 5, 2024. The\\ndataset, available at https://dx.doi.org/10.21227/7fvc-y093, contains Instagram\\nposts about mpox in 52 languages. For each of these posts, the Post ID, Post\\nDescription, Date of publication, language, and translated version of the post\\n(translation to English was performed using the Google Translate API) are\\npresented as separate attributes in the dataset. After developing this dataset,\\nsentiment analysis, hate speech detection, and anxiety or stress detection were\\nperformed. This process included classifying each post into (i) one of the\\nsentiment classes, i.e., fear, surprise, joy, sadness, anger, disgust, or\\nneutral, (ii) hate or not hate, and (iii) anxiety/stress detected or no\\nanxiety/stress detected. These results are presented as separate attributes in\\nthe dataset. Second, this paper presents the results of performing sentiment\\nanalysis, hate speech analysis, and anxiety or stress analysis. The variation\\nof the sentiment classes - fear, surprise, joy, sadness, anger, disgust, and\\nneutral were observed to be 27.95%, 2.57%, 8.69%, 5.94%, 2.69%, 1.53%, and\\n50.64%, respectively. In terms of hate speech detection, 95.75% of the posts\\ndid not contain hate and the remaining 4.25% of the posts contained hate.\\nFinally, 72.05% of the posts did not indicate any anxiety/stress, and the\\nremaining 27.95% of the posts represented some form of anxiety/stress.\",\"PeriodicalId\":501032,\"journal\":{\"name\":\"arXiv - CS - Social and Information Networks\",\"volume\":\"5 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Social and Information Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.05292\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Social and Information Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05292","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

世界目前正在经历一场天花疫情爆发，世卫组织已将其宣布为国际关注的公共卫生紧急事件。此前没有任何与社交媒体挖掘相关的工作专注于开发有关麻疹疫情的 Instagram 帖子数据集。本文介绍的工作旨在填补这一研究空白，并为这一领域做出了两项科学贡献。首先，本文介绍了一个包含 60127 条 Instagram 上关于麻风腮疫情帖子的多语言数据集，这些帖子发布于 2022 年 7 月 23 日至 2024 年 9 月 5 日之间。该数据集可在 https://dx.doi.org/10.21227/7fvc-y093 网站上查阅，其中包含 52 种语言的关于 mpox 的 Instagram 帖子。对于每条帖子，帖子 ID、帖子描述、发布日期、语言和帖子的翻译版本（使用谷歌翻译 API 翻译成英文）都作为单独的属性显示在数据集中。开发完数据集后，我们进行了情感分析、仇恨言论检测以及焦虑或压力检测。这一过程包括将每篇帖子分为：(i) 一种情感类别，即恐惧、惊讶、喜悦、悲伤、愤怒、厌恶或中性；(ii) 仇恨或非仇恨；(iii) 检测到焦虑/压力或未检测到焦虑/压力。这些结果在数据集中作为单独的属性呈现。其次，本文介绍了情感分析、仇恨言论分析以及焦虑或压力分析的结果。据观察，情感类别（恐惧、惊讶、喜悦、悲伤、愤怒、厌恶和中性）的变化率分别为 27.95%、2.57%、8.69%、5.94%、2.69%、1.53% 和 50.64%。在仇恨言论检测方面，95.75% 的帖子不包含仇恨言论，其余 4.25% 的帖子包含仇恨言论。最后，72.05% 的帖子没有显示任何焦虑/压力，其余 27.95% 的帖子表现出某种形式的焦虑/压力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Mpox Narrative on Instagram: A Labeled Multilingual Dataset of Instagram Posts on Mpox for Sentiment, Hate Speech, and Anxiety Analysis

The world is currently experiencing an outbreak of mpox, which has been declared a Public Health Emergency of International Concern by WHO. No prior work related to social media mining has focused on the development of a dataset of Instagram posts about the mpox outbreak. The work presented in this paper aims to address this research gap and makes two scientific contributions to this field. First, it presents a multilingual dataset of 60,127 Instagram posts about mpox, published between July 23, 2022, and September 5, 2024. The dataset, available at https://dx.doi.org/10.21227/7fvc-y093, contains Instagram posts about mpox in 52 languages. For each of these posts, the Post ID, Post Description, Date of publication, language, and translated version of the post (translation to English was performed using the Google Translate API) are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis, hate speech detection, and anxiety or stress detection were performed. This process included classifying each post into (i) one of the sentiment classes, i.e., fear, surprise, joy, sadness, anger, disgust, or neutral, (ii) hate or not hate, and (iii) anxiety/stress detected or no anxiety/stress detected. These results are presented as separate attributes in the dataset. Second, this paper presents the results of performing sentiment analysis, hate speech analysis, and anxiety or stress analysis. The variation of the sentiment classes - fear, surprise, joy, sadness, anger, disgust, and neutral were observed to be 27.95%, 2.57%, 8.69%, 5.94%, 2.69%, 1.53%, and 50.64%, respectively. In terms of hate speech detection, 95.75% of the posts did not contain hate and the remaining 4.25% of the posts contained hate. Finally, 72.05% of the posts did not indicate any anxiety/stress, and the remaining 27.95% of the posts represented some form of anxiety/stress.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Social and Information Networks

自引率

0.00%

发文量