A Novel Hybrid Attention-Based Dilated Network for Depression Classification Model from Multimodal Data Using Improved Heuristic Approach

IF 0.8 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING International Journal of Image and Graphics Pub Date : 2024-07-10 DOI:10.1142/s0219467826500105

B. Manjulatha, Suresh Pabboju

{"title":"A Novel Hybrid Attention-Based Dilated Network for Depression Classification Model from Multimodal Data Using Improved Heuristic Approach","authors":"B. Manjulatha, Suresh Pabboju","doi":"10.1142/s0219467826500105","DOIUrl":null,"url":null,"abstract":"Automatic depression classification from multimodal input data is a challenging task. Modern methods use paralinguistic information such as audio and video signals. Using linguistic information such as speech signals and text data for depression classification is a complicated task in deep learning models. Best audio and video features are built to produce a dependable depression classification system. Textual signals related to depression classification are analyzed using text-based content data. Moreover, to increase the achievements of the depression classification system, audio, visual, and text descriptors are used. So, a deep learning-based depression classification model is developed to detect the person with depression from multimodal data. The EEG signal, Speech signal, video, and text are gathered from standard databases. Four stages of feature extraction take place. In the first stage, the features from the decomposed EEG signals are attained by the empirical mode decomposition (EMD) method, and features are extracted by means of linear and nonlinear feature extraction. In the second stage, the spectral features of the speech signals from the Mel-frequency cepstral coefficients (MFCC) are extracted. In the third stage, the facial texture features from the input video are extracted. In the fourth stage of feature extraction, the input text data are pre-processed, and from the pre-processed data, the textual features are extracted by using the Transformer Net. All four sets of features are optimally selected and combined with the optimal weights to get the weighted fused features using the enhanced mountaineering team-based optimization algorithm (EMTOA). The optimal weighted fused features are finally given to the hybrid attention-based dilated network (HADN). The HDAN is developed by combining temporal convolutional network (TCN) with bidirectional long short-term memory (Bi-LSTM). The parameters in the HDAN are optimized with the assistance of the developed EMTOA algorithm. At last, the classified output of depression is obtained from the HDAN. The efficiency of the developed deep learning HDAN is validated by comparing it with various traditional classification models.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":0.8000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Image and Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s0219467826500105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Automatic depression classification from multimodal input data is a challenging task. Modern methods use paralinguistic information such as audio and video signals. Using linguistic information such as speech signals and text data for depression classification is a complicated task in deep learning models. Best audio and video features are built to produce a dependable depression classification system. Textual signals related to depression classification are analyzed using text-based content data. Moreover, to increase the achievements of the depression classification system, audio, visual, and text descriptors are used. So, a deep learning-based depression classification model is developed to detect the person with depression from multimodal data. The EEG signal, Speech signal, video, and text are gathered from standard databases. Four stages of feature extraction take place. In the first stage, the features from the decomposed EEG signals are attained by the empirical mode decomposition (EMD) method, and features are extracted by means of linear and nonlinear feature extraction. In the second stage, the spectral features of the speech signals from the Mel-frequency cepstral coefficients (MFCC) are extracted. In the third stage, the facial texture features from the input video are extracted. In the fourth stage of feature extraction, the input text data are pre-processed, and from the pre-processed data, the textual features are extracted by using the Transformer Net. All four sets of features are optimally selected and combined with the optimal weights to get the weighted fused features using the enhanced mountaineering team-based optimization algorithm (EMTOA). The optimal weighted fused features are finally given to the hybrid attention-based dilated network (HADN). The HDAN is developed by combining temporal convolutional network (TCN) with bidirectional long short-term memory (Bi-LSTM). The parameters in the HDAN are optimized with the assistance of the developed EMTOA algorithm. At last, the classified output of depression is obtained from the HDAN. The efficiency of the developed deep learning HDAN is validated by comparing it with various traditional classification models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用改进的启发式方法，从多模态数据中建立基于注意力的新型混合稀疏网络抑郁分类模型

从多模态输入数据中自动进行抑郁分类是一项具有挑战性的任务。现代方法使用音频和视频信号等副语言信息。在深度学习模型中，使用语音信号和文本数据等语言信息进行抑郁分类是一项复杂的任务。建立最佳的音频和视频特征，才能产生可靠的抑郁分类系统。与抑郁分类相关的文本信号是利用基于文本的内容数据进行分析的。此外，为了提高抑郁分类系统的成就，还使用了音频、视觉和文本描述符。因此，我们开发了基于深度学习的抑郁分类模型，以便从多模态数据中检测抑郁症患者。脑电信号、语音信号、视频和文本都是从标准数据库中收集的。特征提取分为四个阶段。第一阶段，通过经验模式分解（EMD）方法从分解的脑电信号中获取特征，并通过线性和非线性特征提取方法提取特征。第二阶段，从梅尔频率倒频谱系数（MFCC）中提取语音信号的频谱特征。第三阶段，从输入视频中提取面部纹理特征。在特征提取的第四阶段，对输入的文本数据进行预处理，并使用变换网从预处理后的数据中提取文本特征。使用基于登山队的增强优化算法（EMTOA）对所有四组特征进行优化选择，并结合最佳权重，得到加权融合特征。最后将最优的加权融合特征赋予基于注意力的混合扩张网络（HADN）。HDAN 是通过将时序卷积网络 (TCN) 与双向长短时记忆 (Bi-LSTM) 相结合而开发的。在所开发的 EMTOA 算法的帮助下，HDAN 的参数得到了优化。最后，从 HDAN 中获得抑郁症的分类输出。通过与各种传统分类模型进行比较，验证了所开发的深度学习 HDAN 的效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of Image and Graphics COMPUTER SCIENCE, SOFTWARE ENGINEERING-

CiteScore

2.40

自引率

18.80%

发文量