A Multimodal Approach for Dementia Detection from Spontaneous Speech with Tensor Fusion Layer

Loukas Ilias, D. Askounis, J. Psarras
{"title":"A Multimodal Approach for Dementia Detection from Spontaneous Speech with Tensor Fusion Layer","authors":"Loukas Ilias, D. Askounis, J. Psarras","doi":"10.1109/BHI56158.2022.9926818","DOIUrl":null,"url":null,"abstract":"Alzheimer's disease (AD) is a progressive neurological disorder, meaning that the symptoms develop gradually throughout the years. It is also the main cause of dementia, which affects memory, thinking skills, and mental abilities. Nowadays, researchers have moved their interest towards AD detection from spontaneous speech, since it constitutes a time-effective procedure. However, existing state-of-the-art works proposing multimodal approaches do not take into consideration the inter- and intra-modal interactions and propose early and late fusion approaches. To tackle these limitations, we propose deep neural networks, which can be trained in an end-to-end trainable way and capture the inter- and intra-modal interactions. Firstly, each audio file is converted to an image consisting of three channels, i.e., log-Mel spectrogram, delta, and delta-delta. Next, each transcript is passed through a BERT model followed by a gated self-attention layer. Similarly, each image is passed through a Swin Transformer followed by an independent gated self-attention layer. Acoustic features are extracted also from each audio file. Finally, the representation vectors from the different modalities are fed to a tensor fusion layer for capturing the inter-modal interactions. Extensive experiments conducted on the ADReSS Challenge dataset indicate that our introduced approaches obtain valuable advantages over existing research initiatives reaching Accuracy and F1-score up to 86.25% and 85.48% respectively.","PeriodicalId":347210,"journal":{"name":"2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BHI56158.2022.9926818","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Alzheimer's disease (AD) is a progressive neurological disorder, meaning that the symptoms develop gradually throughout the years. It is also the main cause of dementia, which affects memory, thinking skills, and mental abilities. Nowadays, researchers have moved their interest towards AD detection from spontaneous speech, since it constitutes a time-effective procedure. However, existing state-of-the-art works proposing multimodal approaches do not take into consideration the inter- and intra-modal interactions and propose early and late fusion approaches. To tackle these limitations, we propose deep neural networks, which can be trained in an end-to-end trainable way and capture the inter- and intra-modal interactions. Firstly, each audio file is converted to an image consisting of three channels, i.e., log-Mel spectrogram, delta, and delta-delta. Next, each transcript is passed through a BERT model followed by a gated self-attention layer. Similarly, each image is passed through a Swin Transformer followed by an independent gated self-attention layer. Acoustic features are extracted also from each audio file. Finally, the representation vectors from the different modalities are fed to a tensor fusion layer for capturing the inter-modal interactions. Extensive experiments conducted on the ADReSS Challenge dataset indicate that our introduced approaches obtain valuable advantages over existing research initiatives reaching Accuracy and F1-score up to 86.25% and 85.48% respectively.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于张量融合层的自发性语音痴呆多模态检测方法
阿尔茨海默病(AD)是一种进行性神经系统疾病,这意味着症状是多年来逐渐发展的。它也是痴呆症的主要原因,痴呆症会影响记忆力、思维能力和心智能力。如今,研究人员已经将他们的兴趣转移到从自发语音中检测AD,因为它构成了一个时间有效的过程。然而,目前提出多模态方法的最先进的工作没有考虑到模态间和模态内的相互作用,并提出了早期和晚期融合方法。为了解决这些限制,我们提出了深度神经网络,它可以以端到端可训练的方式进行训练,并捕获模态间和模态内的相互作用。首先,将每个音频文件转换成由三个通道组成的图像,即对数-梅尔谱图、delta和delta-delta。接下来,每个文本经过一个BERT模型,然后是一个封闭的自注意层。类似地,每个图像都通过Swin Transformer,然后是一个独立的门控自关注层。声学特征也从每个音频文件中提取出来。最后,将来自不同模态的表示向量馈送到张量融合层以捕获模态间的相互作用。在address Challenge数据集上进行的大量实验表明,我们引入的方法比现有的研究计划获得了宝贵的优势,准确率和f1得分分别达到86.25%和85.48%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
BEBOP: Bidirectional dEep Brain cOnnectivity maPping Stabilizing Skeletal Pose Estimation using mmWave Radar via Dynamic Model and Filtering Behavioral Data Categorization for Transformers-based Models in Digital Health Gender Difference in Prognosis of Patients with Heart Failure: A Propensity Score Matching Analysis Influence of Sensor Position and Body Movements on Radar-Based Heart Rate Monitoring
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1