Moamen Zaher, Amr S Ghoneim, Laila Abdelhamid, Ayman Atia
{"title":"融合 CNN 和注意力机制,改进实时室内人类活动识别,对家庭物理康复练习进行分类。","authors":"Moamen Zaher, Amr S Ghoneim, Laila Abdelhamid, Ayman Atia","doi":"10.1016/j.compbiomed.2024.109399","DOIUrl":null,"url":null,"abstract":"<p><p>Physical rehabilitation plays a critical role in enhancing health outcomes globally. However, the shortage of physiotherapists, particularly in developing countries where the ratio is approximately ten physiotherapists per million people, poses a significant challenge to effective rehabilitation services. The existing literature on rehabilitation often falls short in data representation and the employment of diverse modalities, limiting the potential for advanced therapeutic interventions. To address this gap, This study integrates Computer Vision and Human Activity Recognition (HAR) technologies to support home-based rehabilitation. The study mitigates this gap by exploring various modalities and proposing a framework for data representation. We introduce a novel framework that leverages both Continuous Wavelet Transform (CWT) and Mel-Frequency Cepstral Coefficients (MFCC) for skeletal data representation. CWT is particularly valuable for capturing the time-frequency characteristics of dynamic movements involved in rehabilitation exercises, enabling a comprehensive depiction of both temporal and spectral features. This dual capability is crucial for accurately modelling the complex and variable nature of rehabilitation exercises. In our analysis, we evaluate 20 CNN-based models and one Vision Transformer (ViT) model. Additionally, we propose 12 hybrid architectures that combine CNN-based models with ViT in bi-model and tri-model configurations. These models are rigorously tested on the UI-PRMD and KIMORE benchmark datasets using key evaluation metrics, including accuracy, precision, recall, and F1-score, with 5-fold cross-validation. Our evaluation also considers real-time performance, model size, and efficiency on low-power devices, emphasising practical applicability. The proposed fused tri-model architectures outperform both single-architectures and bi-model configurations, demonstrating robust performance across both datasets and making the fused models the preferred choice for rehabilitation tasks. Our proposed hybrid model, DenMobVit, consistently surpasses state-of-the-art methods, achieving accuracy improvements of 2.9% and 1.97% on the UI-PRMD and KIMORE datasets, respectively. These findings highlight the effectiveness of our approach in advancing rehabilitation technologies and bridging the gap in physiotherapy services.</p>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"184 ","pages":"109399"},"PeriodicalIF":7.0000,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fusing CNNs and attention-mechanisms to improve real-time indoor Human Activity Recognition for classifying home-based physical rehabilitation exercises.\",\"authors\":\"Moamen Zaher, Amr S Ghoneim, Laila Abdelhamid, Ayman Atia\",\"doi\":\"10.1016/j.compbiomed.2024.109399\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Physical rehabilitation plays a critical role in enhancing health outcomes globally. However, the shortage of physiotherapists, particularly in developing countries where the ratio is approximately ten physiotherapists per million people, poses a significant challenge to effective rehabilitation services. The existing literature on rehabilitation often falls short in data representation and the employment of diverse modalities, limiting the potential for advanced therapeutic interventions. To address this gap, This study integrates Computer Vision and Human Activity Recognition (HAR) technologies to support home-based rehabilitation. The study mitigates this gap by exploring various modalities and proposing a framework for data representation. We introduce a novel framework that leverages both Continuous Wavelet Transform (CWT) and Mel-Frequency Cepstral Coefficients (MFCC) for skeletal data representation. CWT is particularly valuable for capturing the time-frequency characteristics of dynamic movements involved in rehabilitation exercises, enabling a comprehensive depiction of both temporal and spectral features. This dual capability is crucial for accurately modelling the complex and variable nature of rehabilitation exercises. In our analysis, we evaluate 20 CNN-based models and one Vision Transformer (ViT) model. Additionally, we propose 12 hybrid architectures that combine CNN-based models with ViT in bi-model and tri-model configurations. These models are rigorously tested on the UI-PRMD and KIMORE benchmark datasets using key evaluation metrics, including accuracy, precision, recall, and F1-score, with 5-fold cross-validation. Our evaluation also considers real-time performance, model size, and efficiency on low-power devices, emphasising practical applicability. The proposed fused tri-model architectures outperform both single-architectures and bi-model configurations, demonstrating robust performance across both datasets and making the fused models the preferred choice for rehabilitation tasks. Our proposed hybrid model, DenMobVit, consistently surpasses state-of-the-art methods, achieving accuracy improvements of 2.9% and 1.97% on the UI-PRMD and KIMORE datasets, respectively. These findings highlight the effectiveness of our approach in advancing rehabilitation technologies and bridging the gap in physiotherapy services.</p>\",\"PeriodicalId\":10578,\"journal\":{\"name\":\"Computers in biology and medicine\",\"volume\":\"184 \",\"pages\":\"109399\"},\"PeriodicalIF\":7.0000,\"publicationDate\":\"2024-11-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers in biology and medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1016/j.compbiomed.2024.109399\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1016/j.compbiomed.2024.109399","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
Fusing CNNs and attention-mechanisms to improve real-time indoor Human Activity Recognition for classifying home-based physical rehabilitation exercises.
Physical rehabilitation plays a critical role in enhancing health outcomes globally. However, the shortage of physiotherapists, particularly in developing countries where the ratio is approximately ten physiotherapists per million people, poses a significant challenge to effective rehabilitation services. The existing literature on rehabilitation often falls short in data representation and the employment of diverse modalities, limiting the potential for advanced therapeutic interventions. To address this gap, This study integrates Computer Vision and Human Activity Recognition (HAR) technologies to support home-based rehabilitation. The study mitigates this gap by exploring various modalities and proposing a framework for data representation. We introduce a novel framework that leverages both Continuous Wavelet Transform (CWT) and Mel-Frequency Cepstral Coefficients (MFCC) for skeletal data representation. CWT is particularly valuable for capturing the time-frequency characteristics of dynamic movements involved in rehabilitation exercises, enabling a comprehensive depiction of both temporal and spectral features. This dual capability is crucial for accurately modelling the complex and variable nature of rehabilitation exercises. In our analysis, we evaluate 20 CNN-based models and one Vision Transformer (ViT) model. Additionally, we propose 12 hybrid architectures that combine CNN-based models with ViT in bi-model and tri-model configurations. These models are rigorously tested on the UI-PRMD and KIMORE benchmark datasets using key evaluation metrics, including accuracy, precision, recall, and F1-score, with 5-fold cross-validation. Our evaluation also considers real-time performance, model size, and efficiency on low-power devices, emphasising practical applicability. The proposed fused tri-model architectures outperform both single-architectures and bi-model configurations, demonstrating robust performance across both datasets and making the fused models the preferred choice for rehabilitation tasks. Our proposed hybrid model, DenMobVit, consistently surpasses state-of-the-art methods, achieving accuracy improvements of 2.9% and 1.97% on the UI-PRMD and KIMORE datasets, respectively. These findings highlight the effectiveness of our approach in advancing rehabilitation technologies and bridging the gap in physiotherapy services.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.