CNN-Based Models for Emotion and Sentiment Analysis Using Speech Data

IF 1.8 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE ACM Transactions on Asian and Low-Resource Language Information Processing Pub Date : 2024-08-08 DOI:10.1145/3687303

Anjum Madan, Devender Kumar

{"title":"CNN-Based Models for Emotion and Sentiment Analysis Using Speech Data","authors":"Anjum Madan, Devender Kumar","doi":"10.1145/3687303","DOIUrl":null,"url":null,"abstract":"The study aims to present an in-depth Sentiment Analysis (SA) grounded by the presence of emotions in the speech signals. Nowadays, all kinds of web-based applications ranging from social media platforms and video-sharing sites to e-commerce applications provide support for Human-Computer Interfaces (HCIs). These media applications allow users to share their experiences in all forms such as text, audio, video, GIF, etc. The most natural and fundamental form of expressing oneself is through speech. Speech-Based Sentiment Analysis (SBSA) is the task of gaining insights into speech signals. It aims to classify the statement as neutral, negative, or positive. On the other hand, Speech Emotion Recognition (SER) categorizes speech signals into the following emotions: disgust, fear, sadness, anger, happiness, and neutral. It is necessary to recognize the sentiments along with the profoundness of the emotions in the speech signals. To cater to the above idea, a methodology is proposed defining a text-oriented SA model using the combination of CNN and Bi-LSTM techniques along with an embedding layer, applied to the text obtained from speech signals; achieving an accuracy of 84.49%. Also, the proposed methodology suggests an Emotion Analysis (EA) model based on the CNN technique highlighting the type of emotion present in the speech signal with an accuracy measure of 95.12%. The presented architecture can also be applied to different other domains like product review systems, video recommendation systems, education, health, security, etc.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Asian and Low-Resource Language Information Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3687303","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The study aims to present an in-depth Sentiment Analysis (SA) grounded by the presence of emotions in the speech signals. Nowadays, all kinds of web-based applications ranging from social media platforms and video-sharing sites to e-commerce applications provide support for Human-Computer Interfaces (HCIs). These media applications allow users to share their experiences in all forms such as text, audio, video, GIF, etc. The most natural and fundamental form of expressing oneself is through speech. Speech-Based Sentiment Analysis (SBSA) is the task of gaining insights into speech signals. It aims to classify the statement as neutral, negative, or positive. On the other hand, Speech Emotion Recognition (SER) categorizes speech signals into the following emotions: disgust, fear, sadness, anger, happiness, and neutral. It is necessary to recognize the sentiments along with the profoundness of the emotions in the speech signals. To cater to the above idea, a methodology is proposed defining a text-oriented SA model using the combination of CNN and Bi-LSTM techniques along with an embedding layer, applied to the text obtained from speech signals; achieving an accuracy of 84.49%. Also, the proposed methodology suggests an Emotion Analysis (EA) model based on the CNN technique highlighting the type of emotion present in the speech signal with an accuracy measure of 95.12%. The presented architecture can also be applied to different other domains like product review systems, video recommendation systems, education, health, security, etc.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于 CNN 的语音数据情感和情绪分析模型

本研究旨在通过语音信号中存在的情感，提出一种深入的情感分析（Sentiment Analysis，SA）方法。如今，从社交媒体平台、视频分享网站到电子商务应用，各种基于网络的应用都为人机交互界面（HCI）提供了支持。这些媒体应用允许用户以文本、音频、视频、GIF 等各种形式分享他们的体验。最自然、最基本的表达方式是语音。基于语音的情感分析（SBSA）是一项深入了解语音信号的任务。其目的是将语句分为中性、负面或正面。另一方面，语音情感识别（SER）将语音信号分为以下几种情感：厌恶、恐惧、悲伤、愤怒、快乐和中性。有必要识别语音信号中的情绪以及情绪的深刻程度。为了迎合上述想法，我们提出了一种方法，利用 CNN 和 Bi-LSTM 技术的组合以及嵌入层，定义了一个面向文本的 SA 模型，并将其应用于从语音信号中获取的文本；准确率达到了 84.49%。此外，该方法还提出了一种基于 CNN 技术的情感分析（EA）模型，可突出语音信号中的情感类型，准确率高达 95.12%。所提出的架构还可应用于其他不同领域，如产品评论系统、视频推荐系统、教育、健康、安全等。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Asian and Low-Resource Language Information Processing Computer Science-General Computer Science

CiteScore

3.60

自引率

15.00%

发文量

241

期刊介绍： The ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) publishes high quality original archival papers and technical notes in the areas of computation and processing of information in Asian languages, low-resource languages of Africa, Australasia, Oceania and the Americas, as well as related disciplines. The subject areas covered by TALLIP include, but are not limited to: -Computational Linguistics: including computational phonology, computational morphology, computational syntax (e.g. parsing), computational semantics, computational pragmatics, etc. -Linguistic Resources: including computational lexicography, terminology, electronic dictionaries, cross-lingual dictionaries, electronic thesauri, etc. -Hardware and software algorithms and tools for Asian or low-resource language processing, e.g., handwritten character recognition. -Information Understanding: including text understanding, speech understanding, character recognition, discourse processing, dialogue systems, etc. -Machine Translation involving Asian or low-resource languages. -Information Retrieval: including natural language processing (NLP) for concept-based indexing, natural language query interfaces, semantic relevance judgments, etc. -Information Extraction and Filtering: including automatic abstraction, user profiling, etc. -Speech processing: including text-to-speech synthesis and automatic speech recognition. -Multimedia Asian Information Processing: including speech, image, video, image/text translation, etc. -Cross-lingual information processing involving Asian or low-resource languages. -Papers that deal in theory, systems design, evaluation and applications in the aforesaid subjects are appropriate for TALLIP. Emphasis will be placed on the originality and the practical significance of the reported research.