Evaluating Arabic Emotion Recognition Task Using ChatGPT Models: A Comparative Analysis between Emotional Stimuli Prompt, Fine-Tuning, and In-Context Learning

IF 5.1 3区管理学 Q1 BUSINESS Journal of Theoretical and Applied Electronic Commerce Research Pub Date : 2024-05-14 DOI:10.3390/jtaer19020058

El Habib Nfaoui, Hanane Elfaik

{"title":"Evaluating Arabic Emotion Recognition Task Using ChatGPT Models: A Comparative Analysis between Emotional Stimuli Prompt, Fine-Tuning, and In-Context Learning","authors":"El Habib Nfaoui, Hanane Elfaik","doi":"10.3390/jtaer19020058","DOIUrl":null,"url":null,"abstract":"Textual emotion recognition (TER) has significant commercial potential since it can be used as an excellent tool to monitor a brand/business reputation, understand customer satisfaction, and personalize recommendations. It is considered a natural language processing task that can be used to understand and classify emotions such as anger, happiness, and surprise being conveyed in a piece of text (product reviews, tweets, and comments). Despite the advanced development of deep learning and particularly transformer architectures, Arabic-focused models for emotion classification have not achieved satisfactory accuracy. This is mainly due to the morphological richness, agglutination, dialectal variation, and low-resource datasets of the Arabic language, as well as the unique features of user-generated text such as noisiness, shortness, and informal language. This study aims to illustrate the effectiveness of large language models on Arabic multi-label emotion classification. We evaluated GPT-3.5 Turbo and GPT-4 using three different settings: in-context learning, emotional stimuli prompt, and fine-tuning. The ultimate objective of this research paper is to determine if these LLMs, which have multilingual capabilities, could contribute to enhancing the aforementioned task and encourage its use within the context of an e-commerce environment for example. The experimental results indicated that the fine-tuned GPT-3.5 Turbo model achieved an accuracy of 62.03%, a micro-averaged F1-score of 73%, and a macro-averaged F1-score of 62%, establishing a new state-of-the-art benchmark for the task of Arabic multi-label emotion recognition.","PeriodicalId":46198,"journal":{"name":"Journal of Theoretical and Applied Electronic Commerce Research","volume":"35 1","pages":""},"PeriodicalIF":5.1000,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Theoretical and Applied Electronic Commerce Research","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.3390/jtaer19020058","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BUSINESS","Score":null,"Total":0}

引用次数: 0

Abstract

Textual emotion recognition (TER) has significant commercial potential since it can be used as an excellent tool to monitor a brand/business reputation, understand customer satisfaction, and personalize recommendations. It is considered a natural language processing task that can be used to understand and classify emotions such as anger, happiness, and surprise being conveyed in a piece of text (product reviews, tweets, and comments). Despite the advanced development of deep learning and particularly transformer architectures, Arabic-focused models for emotion classification have not achieved satisfactory accuracy. This is mainly due to the morphological richness, agglutination, dialectal variation, and low-resource datasets of the Arabic language, as well as the unique features of user-generated text such as noisiness, shortness, and informal language. This study aims to illustrate the effectiveness of large language models on Arabic multi-label emotion classification. We evaluated GPT-3.5 Turbo and GPT-4 using three different settings: in-context learning, emotional stimuli prompt, and fine-tuning. The ultimate objective of this research paper is to determine if these LLMs, which have multilingual capabilities, could contribute to enhancing the aforementioned task and encourage its use within the context of an e-commerce environment for example. The experimental results indicated that the fine-tuned GPT-3.5 Turbo model achieved an accuracy of 62.03%, a micro-averaged F1-score of 73%, and a macro-averaged F1-score of 62%, establishing a new state-of-the-art benchmark for the task of Arabic multi-label emotion recognition.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用 ChatGPT 模型评估阿拉伯语情绪识别任务：情绪刺激提示、微调和上下文学习之间的比较分析

文本情感识别（TER）具有巨大的商业潜力，因为它可以作为监测品牌/企业声誉、了解客户满意度和个性化推荐的绝佳工具。它被认为是一种自然语言处理任务，可用于理解和分类文本（产品评论、推特和评论）中传达的愤怒、快乐和惊讶等情绪。尽管深度学习，尤其是转换器架构的发展非常迅速，但以阿拉伯语为重点的情感分类模型并未达到令人满意的准确度。这主要是由于阿拉伯语的形态丰富性、聚合性、方言差异和低资源数据集，以及用户生成文本的独特特征，如噪音、短小和非正式语言。本研究旨在说明大型语言模型在阿拉伯语多标签情感分类中的有效性。我们使用三种不同的设置对 GPT-3.5 Turbo 和 GPT-4 进行了评估：上下文学习、情感刺激提示和微调。本研究论文的最终目的是确定这些具有多语言功能的 LLM 是否有助于增强上述任务，并鼓励在电子商务环境等背景下使用这些 LLM。实验结果表明，经过微调的 GPT-3.5 Turbo 模型的准确率达到了 62.03%，微观平均 F1 分数为 73%，宏观平均 F1 分数为 62%，为阿拉伯语多标签情感识别任务建立了一个新的先进基准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Theoretical and Applied Electronic Commerce Research BUSINESS-

CiteScore

9.50

自引率

3.60%

发文量

期刊介绍： The Journal of Theoretical and Applied Electronic Commerce Research (JTAER) has been created to allow researchers, academicians and other professionals an agile and flexible channel of communication in which to share and debate new ideas and emerging technologies concerned with this rapidly evolving field. Business practices, social, cultural and legal concerns, personal privacy and security, communications technologies, mobile connectivity are among the important elements of electronic commerce and are becoming ever more relevant in everyday life. JTAER will assist in extending and improving the use of electronic commerce for the benefit of our society.