Fazlourrahman Balouchzahi , Sabur Butt , Maaz Amjad , Grigori Sidorov , Alexander Gelbukh
{"title":"UrduHope: Analysis of hope and hopelessness in Urdu texts","authors":"Fazlourrahman Balouchzahi , Sabur Butt , Maaz Amjad , Grigori Sidorov , Alexander Gelbukh","doi":"10.1016/j.knosys.2024.112746","DOIUrl":null,"url":null,"abstract":"<div><div>Hope is a crucial aspect of human psychology that has received considerable attention due to its role in facing challenges in human life. However, current research predominantly focuses on hope as positive anticipation, overlooking its counterpart, hopelessness. This paper addresses this gap by presenting an expanded framework for analyzing hope speech in social media, incorporating hope and hopelessness. Drawing on insights from psychology and Natural Language Processing (NLP), we argue that a comprehensive understanding of human emotions necessitates considering both constructs. We introduce the concept of hopelessness as a distinct category in hope speech analysis and develop a novel dataset for Urdu, an underrepresented language in NLP research. We proposed a semi-supervised annotation procedure by utilizing Large Language Models (LLMs) along with human annotators to annotate the dataset and explored various learning approaches for hope speech detection, including traditional machine learning models, neural networks, and state-of-the-art transformers. The findings demonstrate the effectiveness of different learning approaches in capturing the nuances of hope speech in Urdu social media discourse. The hope speech detection task was modeled in two subtasks: a binary classification of Urdu tweets to Hope and Not Hope classes and then a multiclass classification of Urdu tweets into Generalized, Realistic, and Unrealistic Hopes, along with Hopelessness, and Not Hope (Neutral) categories. The best results for binary classification were obtained with Logistic Regression (LR) with an averaged macro F1 score of 0.7593, and for the multiclass classification experiments, transformers outperformed other experiments with an averaged macro F1 score of 0.4801.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"308 ","pages":"Article 112746"},"PeriodicalIF":7.2000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705124013807","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Hope is a crucial aspect of human psychology that has received considerable attention due to its role in facing challenges in human life. However, current research predominantly focuses on hope as positive anticipation, overlooking its counterpart, hopelessness. This paper addresses this gap by presenting an expanded framework for analyzing hope speech in social media, incorporating hope and hopelessness. Drawing on insights from psychology and Natural Language Processing (NLP), we argue that a comprehensive understanding of human emotions necessitates considering both constructs. We introduce the concept of hopelessness as a distinct category in hope speech analysis and develop a novel dataset for Urdu, an underrepresented language in NLP research. We proposed a semi-supervised annotation procedure by utilizing Large Language Models (LLMs) along with human annotators to annotate the dataset and explored various learning approaches for hope speech detection, including traditional machine learning models, neural networks, and state-of-the-art transformers. The findings demonstrate the effectiveness of different learning approaches in capturing the nuances of hope speech in Urdu social media discourse. The hope speech detection task was modeled in two subtasks: a binary classification of Urdu tweets to Hope and Not Hope classes and then a multiclass classification of Urdu tweets into Generalized, Realistic, and Unrealistic Hopes, along with Hopelessness, and Not Hope (Neutral) categories. The best results for binary classification were obtained with Logistic Regression (LR) with an averaged macro F1 score of 0.7593, and for the multiclass classification experiments, transformers outperformed other experiments with an averaged macro F1 score of 0.4801.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.