Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection

IF 14.8 AI Open Pub Date : 2025-01-01 Epub Date: 2025-01-23 DOI:10.1016/j.aiopen.2025.01.003

Md Shofiqul Islam , Khondokar Fida Hasan , Hasibul Hossain Shajeeb , Humayan Kabir Rana , Md. Saifur Rahman , Md. Munirul Hasan , AKM Azad , Ibrahim Abdullah , Mohammad Ali Moni

{"title":"Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection","authors":"Md Shofiqul Islam , Khondokar Fida Hasan , Hasibul Hossain Shajeeb , Humayan Kabir Rana , Md. Saifur Rahman , Md. Munirul Hasan , AKM Azad , Ibrahim Abdullah , Mohammad Ali Moni","doi":"10.1016/j.aiopen.2025.01.003","DOIUrl":null,"url":null,"abstract":"<div><div>This study presents a comprehensive review of the potential of multimodal deep learning (DL) in medical diagnosis, using COVID-19 as a case example. Motivated by the success of artificial intelligence applications during the COVID-19 pandemic, this research aims to uncover the capabilities of DL in disease screening, prediction, and classification, and to derive insights that enhance the resilience, sustainability, and inclusiveness of science, technology, and innovation systems. Adopting a systematic approach, we investigate the fundamental methodologies, data sources, preprocessing steps, and challenges encountered in various studies and implementations. We explore the architecture of deep learning models, emphasising their data-specific structures and underlying algorithms. Subsequently, we compare different deep learning strategies utilised in COVID-19 analysis, evaluating them based on methodology, data, performance, and prerequisites for future research. By examining diverse data types and diagnostic modalities, this research contributes to scientific understanding and knowledge of the multimodal application of DL and its effectiveness in diagnosis. We have implemented and analysed 11 deep learning models using COVID-19 image, text, and speech (ie, cough) data. Our analysis revealed that the MobileNet model achieved the highest accuracy of 99.97% for COVID-19 image data and 93.73% for speech data (i.e., cough). However, the BiGRU model demonstrated superior performance in COVID-19 text classification with an accuracy of 99.89%. The broader implications of this research suggest potential benefits for other domains and disciplines that could leverage deep learning techniques for image, text, and speech analysis.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 12-44"},"PeriodicalIF":14.8000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI Open","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666651025000038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/23 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This study presents a comprehensive review of the potential of multimodal deep learning (DL) in medical diagnosis, using COVID-19 as a case example. Motivated by the success of artificial intelligence applications during the COVID-19 pandemic, this research aims to uncover the capabilities of DL in disease screening, prediction, and classification, and to derive insights that enhance the resilience, sustainability, and inclusiveness of science, technology, and innovation systems. Adopting a systematic approach, we investigate the fundamental methodologies, data sources, preprocessing steps, and challenges encountered in various studies and implementations. We explore the architecture of deep learning models, emphasising their data-specific structures and underlying algorithms. Subsequently, we compare different deep learning strategies utilised in COVID-19 analysis, evaluating them based on methodology, data, performance, and prerequisites for future research. By examining diverse data types and diagnostic modalities, this research contributes to scientific understanding and knowledge of the multimodal application of DL and its effectiveness in diagnosis. We have implemented and analysed 11 deep learning models using COVID-19 image, text, and speech (ie, cough) data. Our analysis revealed that the MobileNet model achieved the highest accuracy of 99.97% for COVID-19 image data and 93.73% for speech data (i.e., cough). However, the BiGRU model demonstrated superior performance in COVID-19 text classification with an accuracy of 99.89%. The broader implications of this research suggest potential benefits for other domains and disciplines that could leverage deep learning techniques for image, text, and speech analysis.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用图像、语音和文本的医学诊断中深度学习的多模态奇迹：COVID-19检测的全面回顾

本研究以COVID-19为例，全面回顾了多模态深度学习（DL）在医学诊断中的潜力。受2019冠状病毒病大流行期间人工智能应用的成功启发，本研究旨在揭示深度学习在疾病筛查、预测和分类方面的能力，并从中获得增强科学、技术和创新系统的弹性、可持续性和包容性的见解。采用系统的方法，我们研究了各种研究和实现中遇到的基本方法，数据源，预处理步骤和挑战。我们探讨了深度学习模型的架构，强调了它们的数据特定结构和底层算法。随后，我们比较了COVID-19分析中使用的不同深度学习策略，并根据方法、数据、性能和未来研究的先决条件对其进行评估。通过研究不同的数据类型和诊断模式，本研究有助于科学地理解和认识DL的多模式应用及其在诊断中的有效性。我们使用COVID-19图像、文本和语音（如咳嗽）数据实现并分析了11个深度学习模型。我们的分析显示，MobileNet模型在COVID-19图像数据和语音数据（如咳嗽）上的准确率最高，分别为99.97%和93.73%。然而，BiGRU模型在COVID-19文本分类中表现出优异的性能，准确率达到99.89%。这项研究的广泛影响表明，其他领域和学科可以利用深度学习技术进行图像、文本和语音分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

AI Open

CiteScore

45.00

自引率

0.00%

发文量