A multimodal cross-transformer-based model to predict mild cognitive impairment using speech, language and vision

IF 7 2区 医学 Q1 BIOLOGY Computers in biology and medicine Pub Date : 2024-09-26 DOI:10.1016/j.compbiomed.2024.109199
{"title":"A multimodal cross-transformer-based model to predict mild cognitive impairment using speech, language and vision","authors":"","doi":"10.1016/j.compbiomed.2024.109199","DOIUrl":null,"url":null,"abstract":"<div><div>Mild Cognitive Impairment (MCI) is an early stage of memory loss or other cognitive ability loss in individuals who maintain the ability to independently perform most activities of daily living. It is considered a transitional stage between normal cognitive stage and more severe cognitive declines like dementia or Alzheimer’s. Based on the reports from the National Institute of Aging (NIA), people with MCI are at a greater risk of developing dementia, thus it is of great importance to detect MCI at the earliest possible to mitigate the transformation of MCI to Alzheimer’s and dementia. Recent studies have harnessed Artificial Intelligence (AI) to develop automated methods to predict and detect MCI. The majority of the existing research is based on unimodal data (e.g., only speech or prosody), but recent studies have shown that multimodality leads to a more accurate prediction of MCI. However, effectively exploiting different modalities is still a big challenge due to the lack of efficient fusion methods. This study proposes a robust fusion architecture utilizing an embedding-level fusion via a co-attention mechanism to leverage multimodal data for MCI prediction. This approach addresses the limitations of early and late fusion methods, which often fail to preserve inter-modal relationships. Our embedding-level fusion aims to capture complementary information across modalities, enhancing predictive accuracy. We used the I-CONECT dataset, where a large number of semi-structured conversations via internet/webcam between participants aged 75+ years old and interviewers were recorded. We introduce a multimodal speech-language-vision Deep Learning-based method to differentiate MCI from Normal Cognition (NC). Our proposed architecture includes co-attention blocks to fuse three different modalities at the embedding level to find the potential interactions between speech (audio), language (transcribed speech), and vision (facial videos) within the cross-Transformer layer. Experimental results demonstrate that our fusion method achieves an average AUC of 85.3% in detecting MCI from NC, significantly outperforming unimodal (60.9%) and bimodal (76.3%) baseline models. This superior performance highlights the effectiveness of our model in capturing and utilizing the complementary information from multiple modalities, offering a more accurate and reliable approach for MCI prediction.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":null,"pages":null},"PeriodicalIF":7.0000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482524012848","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Mild Cognitive Impairment (MCI) is an early stage of memory loss or other cognitive ability loss in individuals who maintain the ability to independently perform most activities of daily living. It is considered a transitional stage between normal cognitive stage and more severe cognitive declines like dementia or Alzheimer’s. Based on the reports from the National Institute of Aging (NIA), people with MCI are at a greater risk of developing dementia, thus it is of great importance to detect MCI at the earliest possible to mitigate the transformation of MCI to Alzheimer’s and dementia. Recent studies have harnessed Artificial Intelligence (AI) to develop automated methods to predict and detect MCI. The majority of the existing research is based on unimodal data (e.g., only speech or prosody), but recent studies have shown that multimodality leads to a more accurate prediction of MCI. However, effectively exploiting different modalities is still a big challenge due to the lack of efficient fusion methods. This study proposes a robust fusion architecture utilizing an embedding-level fusion via a co-attention mechanism to leverage multimodal data for MCI prediction. This approach addresses the limitations of early and late fusion methods, which often fail to preserve inter-modal relationships. Our embedding-level fusion aims to capture complementary information across modalities, enhancing predictive accuracy. We used the I-CONECT dataset, where a large number of semi-structured conversations via internet/webcam between participants aged 75+ years old and interviewers were recorded. We introduce a multimodal speech-language-vision Deep Learning-based method to differentiate MCI from Normal Cognition (NC). Our proposed architecture includes co-attention blocks to fuse three different modalities at the embedding level to find the potential interactions between speech (audio), language (transcribed speech), and vision (facial videos) within the cross-Transformer layer. Experimental results demonstrate that our fusion method achieves an average AUC of 85.3% in detecting MCI from NC, significantly outperforming unimodal (60.9%) and bimodal (76.3%) baseline models. This superior performance highlights the effectiveness of our model in capturing and utilizing the complementary information from multiple modalities, offering a more accurate and reliable approach for MCI prediction.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用语音、语言和视觉预测轻度认知障碍的多模态交叉变换器模型
轻度认知功能障碍(MCI)是记忆力减退或其他认知能力丧失的早期阶段,患者仍能独立完成大部分日常生活活动。它被认为是正常认知阶段与痴呆症或阿尔茨海默氏症等更严重认知能力下降之间的过渡阶段。根据美国国家老龄化研究所(NIA)的报告,患有 MCI 的人患痴呆症的风险更大,因此尽早发现 MCI 以减轻 MCI 向阿尔茨海默氏症和痴呆症的转化至关重要。最近的研究利用人工智能(AI)开发了预测和检测 MCI 的自动化方法。现有的大部分研究都是基于单模态数据(如仅语音或前音),但最近的研究表明,多模态数据能更准确地预测 MCI。然而,由于缺乏高效的融合方法,有效利用不同模态仍然是一个巨大的挑战。本研究提出了一种稳健的融合架构,利用共同关注机制进行嵌入式融合,从而利用多模态数据进行 MCI 预测。这种方法解决了早期和晚期融合方法的局限性,因为早期和晚期融合方法往往无法保留模态间的关系。我们的嵌入级融合旨在捕捉跨模态的互补信息,从而提高预测的准确性。我们使用了 I-CONECT 数据集,该数据集记录了 75 岁以上的参与者与访谈者之间通过互联网/网络摄像头进行的大量半结构化对话。我们介绍了一种基于深度学习的多模态语音-语言-视觉方法,用于区分 MCI 和正常认知(NC)。我们提出的架构包括共同关注块,用于在嵌入层融合三种不同的模态,从而在交叉转换层中找到语音(音频)、语言(转录语音)和视觉(面部视频)之间的潜在交互。实验结果表明,我们的融合方法在从 NC 检测 MCI 方面的平均 AUC 达到了 85.3%,明显优于单模态(60.9%)和双模态(76.3%)基线模型。这种优异的表现凸显了我们的模型在捕捉和利用来自多种模态的互补信息方面的有效性,为 MCI 预测提供了一种更准确、更可靠的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computers in biology and medicine
Computers in biology and medicine 工程技术-工程:生物医学
CiteScore
11.70
自引率
10.40%
发文量
1086
审稿时长
74 days
期刊介绍: Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.
期刊最新文献
Lightweight medical image segmentation network with multi-scale feature-guided fusion. Shuffled ECA-Net for stress detection from multimodal wearable sensor data. Stacking based ensemble learning framework for identification of nitrotyrosine sites. Two-stage deep learning framework for occlusal crown depth image generation. A joint analysis proposal of nonlinear longitudinal and time-to-event right-, interval-censored data for modeling pregnancy miscarriage.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1