{"title":"利用单模态和多模态神经网络诊断膝骨关节炎:来自骨关节炎倡议的数据","authors":"Xin Yu Teh;Pauline Shan Qing Yeoh;Tao Wang;Xiang Wu;Khairunnisa Hasikin;Khin Wee Lai","doi":"10.1109/ACCESS.2024.3472654","DOIUrl":null,"url":null,"abstract":"Knee osteoarthritis (OA) is a prevalent musculoskeletal condition affecting millions worldwide, posing significant health and economic burdens. Characterized by the degeneration of joint cartilage, the progression of knee OA varies significantly among individuals, making its prediction a complex issue. Previous studies on automated knee OA diagnosis have primarily relied on unimodal data, often overlooking the valuable information present in multi-modal data. Multi-modal learning, which integrates information from various modalities, is increasingly recognized for its potential to enhance diagnostic performance in medical applications. However, such models incur a higher computational load due to the additional data required. This research investigates the feasibility of multi-modal neural networks in knee OA diagnosis by integrating structural demographic data with unstructured imaging data. Three deep learning unimodal models (InceptionV3, DIKO, and EfficientNetv2) were transformed into multi-modal architectures (MF_InceptionNet, MF_DIKO, and MF_Eff) to compare their diagnostic capabilities. The proposed multi-modal models share a common architecture, with unimodal models acting as image feature extraction backbones and separate embedding layers for demographic data. The image features and demographic embeddings are combined into a unified vector before classification. Extensive experiments were conducted to evaluate the performance of these models across different class categories and dataset sizes. MF_DIKO and InceptionV3 emerged as the best multi-modal and unimodal neural networks, respectively, with overall accuracies of 0.67 and 0.75 for 3-class severity classification. Contrary to existing literature, our findings reveal that unimodal neural networks using only imaging features outperform multi-modal networks, suggesting unimodal models might suffice in certain applications.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"12 ","pages":"146698-146717"},"PeriodicalIF":3.4000,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10704620","citationCount":"0","resultStr":"{\"title\":\"Knee Osteoarthritis Diagnosis With Unimodal and Multi-Modal Neural Networks: Data From the Osteoarthritis Initiative\",\"authors\":\"Xin Yu Teh;Pauline Shan Qing Yeoh;Tao Wang;Xiang Wu;Khairunnisa Hasikin;Khin Wee Lai\",\"doi\":\"10.1109/ACCESS.2024.3472654\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Knee osteoarthritis (OA) is a prevalent musculoskeletal condition affecting millions worldwide, posing significant health and economic burdens. Characterized by the degeneration of joint cartilage, the progression of knee OA varies significantly among individuals, making its prediction a complex issue. Previous studies on automated knee OA diagnosis have primarily relied on unimodal data, often overlooking the valuable information present in multi-modal data. Multi-modal learning, which integrates information from various modalities, is increasingly recognized for its potential to enhance diagnostic performance in medical applications. However, such models incur a higher computational load due to the additional data required. This research investigates the feasibility of multi-modal neural networks in knee OA diagnosis by integrating structural demographic data with unstructured imaging data. Three deep learning unimodal models (InceptionV3, DIKO, and EfficientNetv2) were transformed into multi-modal architectures (MF_InceptionNet, MF_DIKO, and MF_Eff) to compare their diagnostic capabilities. The proposed multi-modal models share a common architecture, with unimodal models acting as image feature extraction backbones and separate embedding layers for demographic data. The image features and demographic embeddings are combined into a unified vector before classification. Extensive experiments were conducted to evaluate the performance of these models across different class categories and dataset sizes. MF_DIKO and InceptionV3 emerged as the best multi-modal and unimodal neural networks, respectively, with overall accuracies of 0.67 and 0.75 for 3-class severity classification. Contrary to existing literature, our findings reveal that unimodal neural networks using only imaging features outperform multi-modal networks, suggesting unimodal models might suffice in certain applications.\",\"PeriodicalId\":13079,\"journal\":{\"name\":\"IEEE Access\",\"volume\":\"12 \",\"pages\":\"146698-146717\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-10-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10704620\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Access\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10704620/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10704620/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
摘要
膝关节骨性关节炎(OA)是一种常见的肌肉骨骼疾病,影响着全球数百万人,给健康和经济带来沉重负担。膝关节 OA 以关节软骨退化为特征,其进展因人而异,因此预测是一个复杂的问题。以往关于膝关节 OA 自动诊断的研究主要依赖单模态数据,往往忽略了多模态数据中的宝贵信息。多模态学习整合了来自各种模态的信息,其在医疗应用中提高诊断性能的潜力日益得到认可。然而,由于需要额外的数据,这类模型的计算负荷较高。本研究通过整合结构人口学数据和非结构成像数据,探讨了多模态神经网络在膝关节OA诊断中的可行性。三个深度学习单模态模型(InceptionV3、DIKO 和 EfficientNetv2)被转化为多模态架构(MF_InceptionNet、MF_DIKO 和 MF_Eff),以比较它们的诊断能力。所提出的多模态模型共享一个共同的架构,其中单模态模型作为图像特征提取骨干,而人口统计学数据则有独立的嵌入层。图像特征和人口学嵌入层在分类前被合并成一个统一的向量。为了评估这些模型在不同类别和数据集规模下的性能,我们进行了广泛的实验。MF_DIKO 和 InceptionV3 分别成为最佳的多模态和单模态神经网络,3 类严重程度分类的总体准确率分别为 0.67 和 0.75。与现有文献相反,我们的研究结果表明,仅使用成像特征的单模态神经网络优于多模态网络,这表明单模态模型在某些应用中可能已经足够。
Knee Osteoarthritis Diagnosis With Unimodal and Multi-Modal Neural Networks: Data From the Osteoarthritis Initiative
Knee osteoarthritis (OA) is a prevalent musculoskeletal condition affecting millions worldwide, posing significant health and economic burdens. Characterized by the degeneration of joint cartilage, the progression of knee OA varies significantly among individuals, making its prediction a complex issue. Previous studies on automated knee OA diagnosis have primarily relied on unimodal data, often overlooking the valuable information present in multi-modal data. Multi-modal learning, which integrates information from various modalities, is increasingly recognized for its potential to enhance diagnostic performance in medical applications. However, such models incur a higher computational load due to the additional data required. This research investigates the feasibility of multi-modal neural networks in knee OA diagnosis by integrating structural demographic data with unstructured imaging data. Three deep learning unimodal models (InceptionV3, DIKO, and EfficientNetv2) were transformed into multi-modal architectures (MF_InceptionNet, MF_DIKO, and MF_Eff) to compare their diagnostic capabilities. The proposed multi-modal models share a common architecture, with unimodal models acting as image feature extraction backbones and separate embedding layers for demographic data. The image features and demographic embeddings are combined into a unified vector before classification. Extensive experiments were conducted to evaluate the performance of these models across different class categories and dataset sizes. MF_DIKO and InceptionV3 emerged as the best multi-modal and unimodal neural networks, respectively, with overall accuracies of 0.67 and 0.75 for 3-class severity classification. Contrary to existing literature, our findings reveal that unimodal neural networks using only imaging features outperform multi-modal networks, suggesting unimodal models might suffice in certain applications.
IEEE AccessCOMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
9.80
自引率
7.70%
发文量
6673
审稿时长
6 weeks
期刊介绍:
IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest.
IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on:
Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals.
Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering.
Development of new or improved fabrication or manufacturing techniques.
Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.