Depressive semantic awareness from vlog facial and vocal streams via spatio-temporal transformer

IF 7.5 2区 计算机科学 Q1 TELECOMMUNICATIONS Digital Communications and Networks Pub Date : 2024-06-01 DOI:10.1016/j.dcan.2023.03.007
Yongfeng Tao , Minqiang Yang , Yushan Wu , Kevin Lee , Adrienne Kline , Bin Hu
{"title":"Depressive semantic awareness from vlog facial and vocal streams via spatio-temporal transformer","authors":"Yongfeng Tao ,&nbsp;Minqiang Yang ,&nbsp;Yushan Wu ,&nbsp;Kevin Lee ,&nbsp;Adrienne Kline ,&nbsp;Bin Hu","doi":"10.1016/j.dcan.2023.03.007","DOIUrl":null,"url":null,"abstract":"<div><p>With the rapid growth of information transmission via the Internet, efforts have been made to reduce network load to promote efficiency. One such application is semantic computing, which can extract and process semantic communication. Social media has enabled users to share their current emotions, opinions, and life events through their mobile devices. Notably, people suffering from mental health problems are more willing to share their feelings on social networks. Therefore, it is necessary to extract semantic information from social media (vlog data) to identify abnormal emotional states to facilitate early identification and intervention. Most studies do not consider spatio-temporal information when fusing multimodal information to identify abnormal emotional states such as depression. To solve this problem, this paper proposes a spatio-temporal squeeze transformer method for the extraction of semantic features of depression. First, a module with spatio-temporal data is embedded into the transformer encoder, which is utilized to obtain a representation of spatio-temporal features. Second, a classifier with a voting mechanism is designed to encourage the model to classify depression and non-depression effectively. Experiments are conducted on the D-Vlog dataset. The results show that the method is effective, and the accuracy rate can reach 70.70%. This work provides scaffolding for future work in the detection of affect recognition in semantic communication based on social media vlog data.</p></div>","PeriodicalId":48631,"journal":{"name":"Digital Communications and Networks","volume":null,"pages":null},"PeriodicalIF":7.5000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2352864823000639/pdfft?md5=292aeeac6a55da512686a76b28ab528a&pid=1-s2.0-S2352864823000639-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Communications and Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352864823000639","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

With the rapid growth of information transmission via the Internet, efforts have been made to reduce network load to promote efficiency. One such application is semantic computing, which can extract and process semantic communication. Social media has enabled users to share their current emotions, opinions, and life events through their mobile devices. Notably, people suffering from mental health problems are more willing to share their feelings on social networks. Therefore, it is necessary to extract semantic information from social media (vlog data) to identify abnormal emotional states to facilitate early identification and intervention. Most studies do not consider spatio-temporal information when fusing multimodal information to identify abnormal emotional states such as depression. To solve this problem, this paper proposes a spatio-temporal squeeze transformer method for the extraction of semantic features of depression. First, a module with spatio-temporal data is embedded into the transformer encoder, which is utilized to obtain a representation of spatio-temporal features. Second, a classifier with a voting mechanism is designed to encourage the model to classify depression and non-depression effectively. Experiments are conducted on the D-Vlog dataset. The results show that the method is effective, and the accuracy rate can reach 70.70%. This work provides scaffolding for future work in the detection of affect recognition in semantic communication based on social media vlog data.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过时空变换器从vlog面部和声音流中获得抑郁语义意识
随着互联网信息传输的快速增长,人们一直在努力减轻网络负荷以提高效率。语义计算就是这样一种应用,它可以提取和处理语义通信。社交媒体使用户能够通过移动设备分享他们当前的情绪、观点和生活事件。值得注意的是,有心理健康问题的人更愿意在社交网络上分享他们的感受。因此,有必要从社交媒体(视频日志数据)中提取语义信息来识别异常情绪状态,以便及早识别和干预。大多数研究在融合多模态信息以识别抑郁等异常情绪状态时没有考虑时空信息。为解决这一问题,本文提出了一种提取抑郁语义特征的时空挤压变换器方法。首先,在变压器编码器中嵌入时空数据模块,利用该模块获得时空特征的表示。其次,设计了一个具有投票机制的分类器,以鼓励模型有效地对抑郁和非抑郁进行分类。我们在 D-Vlog 数据集上进行了实验。结果表明,该方法是有效的,准确率可达 70.70%。这项工作为今后基于社交媒体 vlog 数据的语义通信中的情感识别检测工作提供了支架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Digital Communications and Networks
Digital Communications and Networks Computer Science-Hardware and Architecture
CiteScore
12.80
自引率
5.10%
发文量
915
审稿时长
30 weeks
期刊介绍: Digital Communications and Networks is a prestigious journal that emphasizes on communication systems and networks. We publish only top-notch original articles and authoritative reviews, which undergo rigorous peer-review. We are proud to announce that all our articles are fully Open Access and can be accessed on ScienceDirect. Our journal is recognized and indexed by eminent databases such as the Science Citation Index Expanded (SCIE) and Scopus. In addition to regular articles, we may also consider exceptional conference papers that have been significantly expanded. Furthermore, we periodically release special issues that focus on specific aspects of the field. In conclusion, Digital Communications and Networks is a leading journal that guarantees exceptional quality and accessibility for researchers and scholars in the field of communication systems and networks.
期刊最新文献
Editorial Board Scheduling optimization for UAV communication coverage using virtual force-based PSO model Hybrid millimeter wave heterogeneous networks with spatially correlated user equipment A novel hybrid authentication protocol utilizing lattice-based cryptography for IoT devices in fog networks Data-driven human and bot recognition from web activity logs based on hybrid learning techniques
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1